MonetDB/X100 -- Hyper-Pipelining Query Execution (P. Boncz, et al., CIDR 2005) -

July 7, 2024 · notes, databases, monetdb, query execution

MonetDB/X100 -- Hyper-Pipelining Query Execution (P. Boncz, et al., CIDR 2005)

MonetDB/X100: Hyper-Pipelining Query Execution

Overview:

The Volcano iterator model with tuple-at-a-time query execution is inefficient for super-scalar CPUs because of high interpretation overhead and it hinders the compiler from finding parallelism opportunities.
The column-at-a-time execution model of MonetDB improves upon the above problems with the tuple-at-a-time execution model, but it requires materialization of the entire column which causes memory traffic to limit the efficiency of the CPU.
In this paper, the authors introduce a hybrid approach which is like the Volcano iterator model but uses batches of tuples instead. This helps to balance low interpretation overhead with high memory bandwidth, overall increasing CPU efficiency.

Key takeaways:

Super-scalar CPUs have multiple pipelines that can increase CPU performance, but can slow things down when there are dependencies between instructions and when branch prediction is wrong. Also, compilers can optimize code to get CPU utilization (i.e. loop pipelining).
The authors ran TPC-H Query 1 against MySQL, and they observed < 10% of the total execution time was spent on real work for computing the query while much of the execution time was spent on work like copying data in and out of MySQL’s record representation – tuple-at-a-time overhead. They also observed that compiler optimizations like loop pipelining could not be used because the system did not operate on arrays of tuples.
The authors also ran this query against MonetDB/MIL, which uses the column-at-a-time execution model and observed that the full column materialization caused the query to be memory bound rather than CPU bound. They supported this claim with an experiment that eliminated memory traffic. They ran the TPC-H Query 1 at a small scale factor such that all used columns and intermediate materialization results fit in the CPU cache, and they observed bandwidth doubling or more.
A new query processor called X100 is introduced which is a Volcano style query execution model, but on batches of columnar data. This keeps interpretation overhead low, enables the compiler to find optimizations like loop pipelining, and lowers memory traffic by lowering the amount of data that needs to be materialized. They test the approach on TPC-H at 100 GB and achieve up to two orders of magnitude faster performance than contemporary DBMSs.