Quantcast
Channel: NULL BITMAP by Justin Jaffray
Viewing all articles
Browse latest Browse all 138

A Sniff Test for Some Query Optimizers

$
0
0

One important part of query planning is performing transformations over queries. Today I want to see how a couple common databases perform on a completely made-up and unrepresentative benchmark.

I have a lot of caveats for this one. I need to emphasize before we start that the following test is not remotely indicative of overall optimizer quality, and it's not necessarily even a deficiency if an optimizer doesn't pass it, there's a lot of variables that go into deciding which transformations to do. I don't have a deep understanding of how all of these optimizers work, so it's possible they do the transformations I've listed in different situations, or in a cost-based way, or for their use-cases, they're not that useful. I'm also biased because I worked on CockroachDB and am only aware of several transformations because of that.

I have, out of my ass, plucked three SQL Query optimizations. Today we're going to survey a list (which I have also plucked out of my ass) of databases and see which of them do them.

The transformations I have chosen are:

  • An EXISTS to semijoin decorrelation,
  • DISTINCT elimination based on primary key, and
  • a common subexpression elimination.

EXISTS to semijoin decorrelation

Query

CREATETABLEab(aINTPRIMARYKEY,bINT);CREATETABLExy(xINTPRIMARYKEY,yINT);SELECT*FROMabWHEREEXISTS(SELECT*FROMxyWHEREa=x);

Explanation

As written, this query suggests evaluating the subquery SELECT * FROM xy WHERE a = x for each row. This can be transformed into a semijoin, or even a regular inner join to exploit the database's built-in join functionality to do it faster.

DuckDB

Verdict: ✅

EXPLAIN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            SEMI           ├──────────────┐
│  a IS NOT DISTINCT FROM a │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││          SEQ_SCAN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             ab            ││             xy            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             a             ││             x             │
│             b             ││                           │
└───────────────────────────┘└───────────────────────────┘

CockroachDB

Verdict: ✅

> EXPLAIN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);
           info
---------------------------
  distribution: local
  vectorized: true

  • merge join (semi)
  │ equality: (a) = (x)
  │ left cols are key
  │ right cols are key
  │
  ├── • scan
  │     missing stats
  │     table: ab@ab_pkey
  │     spans: FULL SCAN
  │
  └── • scan
        missing stats
        table: xy@xy_pkey
        spans: FULL SCAN
(17 rows)

Postgres

Verdict: ✅

EXPLAIN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);
                            QUERY PLAN
------------------------------------------------------------------
 Hash Join  (cost=60.85..99.39 rows=2260 width=8)
   Hash Cond: (ab.a = xy.x)
   ->  Seq Scan on ab  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=32.60..32.60 rows=2260 width=4)
         ->  Seq Scan on xy  (cost=0.00..32.60 rows=2260 width=4)
(5 rows)

SQLite

Verdict: ❌

> EXPLAIN QUERY PLAN SELECT * FROM ab WHERE EXISTS (SELECT * FROM xy WHERE a = x);
QUERY PLAN
|--SCAN ab
`--CORRELATED SCALAR SUBQUERY 1
   `--SEARCH xy USING INDEX sqlite_autoindex_xy_1 (x=?)

DISTINCT elimination

Query

CREATETABLEab(aINTPRIMARYKEY,bINT);SELECTDISTINCT*FROMab;

Explanation

Since a is a primary key of the table ab, a must be non-null and distinct for every pair of rows. This means that every row in the table is distinct from every other row, and the query can be simplified to:

SELECT*FROMab;

DuckDB

Verdict: ❌

D EXPLAIN SELECT DISTINCT * FROM ab;

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            │
│             #1            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            ab.a           │
│            ab.b           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             ab            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             a             │
│             b             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           │
└───────────────────────────┘

CockroachDB

Verdict: ✅

> EXPLAIN SELECT DISTINCT * FROM ab;
         info
-----------------------
  distribution: local
  vectorized: true

  • scan
    missing stats
    table: ab@ab_pkey
    spans: FULL SCAN
(7 rows)

Postgres

Verdict: ❌

=# EXPLAIN SELECT DISTINCT * FROM ab;
                         QUERY PLAN
------------------------------------------------------------
 HashAggregate  (cost=43.90..66.50 rows=2260 width=8)
   Group Key: a, b
   ->  Seq Scan on ab  (cost=0.00..32.60 rows=2260 width=8)
(3 rows)

SQLite

Verdict: ❌

> EXPLAIN QUERY PLAN SELECT DISTINCT * FROM ab;
QUERY PLAN
|--SCAN ab USING INDEX sqlite_autoindex_ab_1
`--USE TEMP B-TREE FOR DISTINCT

Common Subexpression Elimination

Query

CREATETABLEab(aINTPRIMARYKEY,bINT);SELECT(SELECTmax(a)FROMab),(SELECTmax(a)FROMab)+1,(SELECTmax(b)FROMab),(SELECTmax(b)fromab)+1;

Explanation

This one is a little contrived. I'm not even sure what the right way to do it is. I kind of just want to see what happens. I'll say a database passes if it doesn't scan the tables (even partially) four times.

DuckDB

Verdict: ❌

Oh god, this one is getting me sent to your spam filter for sure.

┌─────────────────────────────┐
│┌───────────────────────────┐│
││       Physical Plan       ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          SUBQUERY         │
│       (SUBQUERY + 1)      │
│          SUBQUERY         │
│       (SUBQUERY + 1)      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│       CROSS_PRODUCT       ├────────────────────────────────────────────────────────────────────────┐
└─────────────┬─────────────┘                                                                        │
┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
│       CROSS_PRODUCT       │                                                          │    UNGROUPED_AGGREGATE    │
│                           ├───────────────────────────────────────────┐              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           │                                           │              │         first(#0)         │
└─────────────┬─────────────┘                                           │              └─────────────┬─────────────┘
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│       CROSS_PRODUCT       │                             │    UNGROUPED_AGGREGATE    ││         PROJECTION        │
│                           ├──────────────┐              │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           │              │              │         first(#0)         ││             #0            │
└─────────────┬─────────────┘              │              └─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    ││    UNGROUPED_AGGREGATE    ││         PROJECTION        ││      STREAMING_LIMIT      │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           │
│         first(#0)         ││         first(#0)         ││             #0            ││                           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        ││      STREAMING_LIMIT      ││    UNGROUPED_AGGREGATE    │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             #0            ││             #0            ││                           ││          max(#0)          │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│      STREAMING_LIMIT      ││      STREAMING_LIMIT      ││    UNGROUPED_AGGREGATE    ││         PROJECTION        │
│                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││          max(#0)          ││             a             │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│    UNGROUPED_AGGREGATE    ││    UNGROUPED_AGGREGATE    ││         PROJECTION        ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          max(#0)          ││          max(#0)          ││             a             ││             ab            │
│                           ││                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││                           ││             a             │
│                           ││                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││                           ││           EC: 0           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             b             ││             b             ││             ab            │
│                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││             a             │
│                           ││                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│                           ││                           ││           EC: 0           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             ab            ││             ab            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             b             ││             b             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 0           ││           EC: 0           │
└───────────────────────────┘└───────────────────────────┘

(Aside, I love this plan format.)

CockroachDB

Verdict: ❌

> EXPLAIN SELECT                                                                                                                                                                           (SELECT max(a) FROM ab),                                                                                                                                                                                          (SELECT max(a) FROM ab)+1,                                                                                                                                                                                        (SELECT max(b) FROM ab),                                                                                                                                                                                          (SELECT max(b) from ab)+1;
                     info
-----------------------------------------------
  distribution: local
  vectorized: true

  • root
  │
  ├── • values
  │     size: 4 columns, 1 row
  │
  ├── • subquery
  │   │ id: @S1
  │   │ original sql: (SELECT max(a) FROM ab)
  │   │ exec mode: one row
  │   │
  │   └── • group (scalar)
  │       │
  │       └── • revscan
  │             missing stats
  │             table: ab@ab_pkey
  │             spans: LIMITED SCAN
  │             limit: 1
  │
  ├── • subquery
  │   │ id: @S2
  │   │ original sql: (SELECT max(a) FROM ab)
  │   │ exec mode: one row
  │   │
  │   └── • group (scalar)
  │       │
  │       └── • revscan
  │             missing stats
  │             table: ab@ab_pkey
  │             spans: LIMITED SCAN
  │             limit: 1
  │
  ├── • subquery
  │   │ id: @S3
  │   │ original sql: (SELECT max(b) FROM ab)
  │   │ exec mode: one row
  │   │
  │   └── • group (scalar)
  │       │
  │       └── • scan
  │             missing stats
  │             table: ab@ab_pkey
  │             spans: FULL SCAN
  │
  └── • subquery
      │ id: @S4
      │ original sql: (SELECT max(b) FROM ab)
      │ exec mode: one row
      │
      └── • group (scalar)
          │
          └── • scan
                missing stats
                table: ab@ab_pkey
                spans: FULL SCAN
(57 rows)

Postgres

Verdict: ❌

                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Result  (cost=76.92..76.94 rows=1 width=16)
   InitPlan 2 (returns $1)
     ->  Result  (cost=0.19..0.20 rows=1 width=4)
           InitPlan 1 (returns $0)
             ->  Limit  (cost=0.15..0.19 rows=1 width=4)
                   ->  Index Only Scan Backward using ab_pkey on ab  (cost=0.15..83.51 rows=2249 width=4)
                         Index Cond: (a IS NOT NULL)
   InitPlan 4 (returns $3)
     ->  Result  (cost=0.19..0.20 rows=1 width=4)
           InitPlan 3 (returns $2)
             ->  Limit  (cost=0.15..0.19 rows=1 width=4)
                   ->  Index Only Scan Backward using ab_pkey on ab ab_1  (cost=0.15..83.51 rows=2249 width=4)
                         Index Cond: (a IS NOT NULL)
   InitPlan 5 (returns $4)
     ->  Aggregate  (cost=38.25..38.26 rows=1 width=4)
           ->  Seq Scan on ab ab_2  (cost=0.00..32.60 rows=2260 width=4)
   InitPlan 6 (returns $5)
     ->  Aggregate  (cost=38.25..38.26 rows=1 width=4)
           ->  Seq Scan on ab ab_3  (cost=0.00..32.60 rows=2260 width=4)
(19 rows)

SQLite

Verdict: ❌

QUERY PLAN
|--SCAN CONSTANT ROW
|--SCALAR SUBQUERY 1
|  `--SEARCH ab USING COVERING INDEX sqlite_autoindex_ab_1
|--SCALAR SUBQUERY 2
|  `--SEARCH ab USING COVERING INDEX sqlite_autoindex_ab_1
|--SCALAR SUBQUERY 3
|  `--SEARCH ab
`--SCALAR SUBQUERY 4
   `--SEARCH ab

Conclusion

I dunno. I wouldn't advise drawing any actual conclusions from the outcome of this, besides it being sort of fun to think about. If you know any of these systems intimately and think I misrepresented them, let me know. Or if you have other fun tests like this, also let me know.


Viewing all articles
Browse latest Browse all 138

Latest Images

Trending Articles



Latest Images