How PRISM optimizes UDFs via a DuckDB extension
This paper The Key to Effective UDF Optimization: Before Inlining, First Perform Outlining implements its ideas in DuckDB as an extension and I was curious specifically on how the authors access the SQL UDF to start working on it.
I got the code from Sam Arch’s github, and I followed the instructions there on building and running their extension.
In udf_transpiler_extension.cpp
in the LoadInternal
function, the PragmaFunction::PragmaCall
for UdfTranspilerPragmaFun
is registered.
inline String UdfTranspilerPragmaFun(ClientContext &context,
const FunctionParameters ¶meters) {
std::cout << "----- Running UdfTranspilerPragmaFun -----" << std::endl;
auto udfString = parameters.values[0].GetValue<String>();
return CompilerRun(udfString);
}
auto udf_transpiler_pragma_function = PragmaFunction::PragmaCall(
"transpile", UdfTranspilerPragmaFun, {LogicalType::VARCHAR});
ExtensionUtil::RegisterFunction(instance, udf_transpiler_pragma_function);
Then, to see when this was called, I traced back to the ClientContext
. The context will first call Query
and then that will call ParseStatements
. When the statement is parsed by ParseStatementsInternal
, it will call HandlePragmaStatements
.
void PragmaHandler::HandlePragmaStatements(ClientContextLock &lock, vector<unique_ptr<SQLStatement>> &statements) {
// first check if there are any pragma statements
bool found_pragma = false;
for (idx_t i = 0; i < statements.size(); i++) {
if (statements[i]->type == StatementType::PRAGMA_STATEMENT ||
statements[i]->type == StatementType::MULTI_STATEMENT) {
std::cout << "statements[i]->type: " << StatementTypeToString(statements[i]->type) << std::endl;
std::cout << "statements query: " << statements[i]->query << std::endl;
found_pragma = true;
break;
}
}
std::cout << "!!!!!!!! Found pragma: " << found_pragma << std::endl;
if (!found_pragma) {
// no pragmas: skip this step
return;
}
context.RunFunctionInTransactionInternal(lock, [&]() { HandlePragmaStatementsInternal(statements); });
}
The UDF is detected by duckdb as a PRAGMA statement.
Here we see the statement has the transpile
function is passed the UDF.
statements[i]->type: PRAGMA
statements query: pragma transpile('CREATE FUNCTION addAbs(val1 INT, val2 INT) RETURNS INT AS $$ BEGIN IF val1 < 0 THEN val1 = -val1; END IF; IF val2 < 0 THEN val2 = -val2; END IF; RETURN val1 + val2; END; $$ LANGUAGE PLPGSQL;');
!!!!!!!! Found pragma: 1
----- Running UdfTranspilerPragmaFun -----