Quantcast
Channel: NULL BITMAP by Justin Jaffray
Viewing all articles
Browse latest Browse all 138

The Little Planner Chapter 4: A Pushdown Party

$
0
0

Hi everybody, apologies, I didn't have time to put together a newsletter this week. In lieu of that, I found this chapter of a book called "The Little Planner" in a trash can in Berkeley, California and have done my best to get the dirt off of it. I'm not sure where it came from, but I hope it's an adequate substitute.

the little planner.png

What's an Expr?

It is an expression over scalar values, like numbers.

What's the definition of Expr we've been using so far?

It's this:

enumExpr{ColRef{id: usize},Int{val: i64},Eq{left: Box<Expr>,right: Box<Expr>},}

What's a "free variable?"

It is a column in an expression that is not bound.

You're kicking the can down the road: what is a "bound variable," then?

It is a column that is given its values within an expression and is not a parameter that comes from some other context.

Can you give me an example?

In the JavaScript expression,

function(a){returna+b;}

a is bound, because it doesn't refer to anything outside of the scope of the expression itself, but b is free.

If expr is an Expr, what is expr.free()?

It is the set of free variables in expr.

Implement Expr::free.

It's

fnfree(&self)-> HashSet<usize>{matchself{Expr::ColRef{id}=>{letmutset=HashSet::new();set.insert(*id);set}Expr::Int{..}=>HashSet::new(),Expr::Eq{left,right}=>{letmutset=left.free();set.extend(right.free());set}}}

What's a RelExpr?

It's an expression over relational values, like scans, filters, and joins.

What's the definition of RelExpr we've been using so far?

enumRelExpr{Scan{table_name: String,column_names: Vec<usize>,},Select{src: Box<RelExpr>,predicates: Vec<Expr>,},Join{left: Box<RelExpr>,right: Box<RelExpr>,predicates: Vec<Expr>,},Project{src: Box<RelExpr>,cols: HashSet<usize>,},}
implRelExpr{fnscan(table_name: String,column_names: Vec<usize>)-> Self{RelExpr::Scan{table_name,column_names,}}fnselect(self,predicates: Vec<Expr>)-> Self{RelExpr::Select{src: Box::new(self),predicates,}}fnjoin(self,other: Self,predicates: Vec<Expr>)-> Self{RelExpr::Join{left: Box::new(self),right: Box::new(other),predicates,}}fnproject(self,cols: HashSet<usize>)-> Self{RelExpr::Project{src: Box::new(self),cols,}}}

What is the attribute set of a relational expression?

It is the set of columns that each row in the evaluated expression has.

If rel is a RelExpr, what's rel.att()?

It is the attribute set of the RelExpr.

Show me its implementation.

fnatt(&self)-> HashSet<usize>{matchself{RelExpr::Scan{column_names,..}=>column_names.iter().cloned().collect(),RelExpr::Select{src,..}=>src.att(),RelExpr::Join{left,right,..}=>{letmutset=left.att();set.extend(right.att());set}RelExpr::Project{cols,..}=>cols.clone(),}}

Are "free variables" and "attribute sets" similar?

They're both sets of columns that we derive from an expression.

How are they different?

Well, "free variables" represent the sets of columns we need, and "attribute sets" represent the sets of columns we have.

Another difference is that "free variables" come from scalar expressions, and "attribute sets" come from relational expressions.

Could relational expressions have free variables?

Yes, if that relational expression itself had scalar expressions that referenced columns not produced by the relational expression.

Could scalar expressions have attribute sets?

No, since an attribute set is about the set of columns that are produced specifically by a relational expression.

What does it mean for an Expr to be "bound by" a RelExpr?

An Expr is bound by a RelExpr if all of its free variables are part of the RelExpr's attribute set.

True or false, @0 is bound by scan("a", [0]).

True, since scan("a", [0]) produces the column 0, it gives us sufficient context to evaluate @0.

True or false, @0 is bound by join([], scan("a", [0]), scan("x", [1])).

True, since a join's attribute set is the union of its inputs' attribute sets, we have the context necessary to evaluate @0.

True or false, @0 is bound by project([1], scan("a", [0, 1])).

False, since the project will throw away the column named 0, we won't have access to it and thus cannot evaluate the expression @0.

Why is that important?

When we attempt to evaluate the scalar expression, we do so in the context of a row. We can only evaluate a scalar expression if the row we have has values for all of the columns we reference.

What's a join?

It's the cross product of two relations, filtered according to some predicates.

Can you show me a join?

letleft=RelExpr::scan("a".into(),vec![0,1]);letright=RelExpr::scan("x".into(),vec![2,3]);letjoin=left.join(right,vec![Expr::col_ref(0).eq(Expr::col_ref(2)),Expr::col_ref(1).eq(Expr::int(100)),],);

This gives a query plan that looks something like this:

-> join(@0=@2 && @1=100)
  -> scan("a", [0, 1])
  -> scan("x", [2, 3])

Explain what this join does.

It looks at every pair of rows from a and x, and emits the ones that satisfy both a.0=x.2 and a.1=100.

How do those predicates differ?

The first one, a.0=x.2, requires both inputs to the join to be bound, but the second one, a.1=100 is bound by only a.

What does that mean?

We don't need the values from x to evaluate if a row satisfies a.1=100.

Does that suggest anything we could do to this query?

We could execute that predicate before doing the join. We could write it like this:

letjoin=left.select(vec![Expr::col_ref(1).eq(Expr::int(100))]).join(right,vec![Expr::col_ref(0).eq(Expr::col_ref(2))]);

For a query plan that looks like this:

-> join(@0=@2)
  -> select(@1=100)
    -> scan("a", [0, 1])
  -> scan("x", [2, 3])

Can we automate this process?

Probably.

How might we do that?

I guess we'd have to check if each predicate is bound by either side of the join alone.

Write bound_by.

implExpr{fnbound_by(&self,rel: &RelExpr)-> bool{self.free().is_subset(&rel.att())}}

How can we use this?

We can intercept calls to join:

fnjoin(self,other: Self,mutpredicates: Vec<Expr>)-> Self{foriin0..predicates.len(){ifpredicates[i].bound_by(&self){// We can push this predicate down.letpredicate=predicates.swap_remove(i);returnself.select(vec![predicate]).join(other,predicates);}ifpredicates[i].bound_by(&other){// We can push this predicate down.letpredicate=predicates.swap_remove(i);returnself.join(other.select(vec![predicate]),predicates);}}RelExpr::Join{left: Box::new(self),right: Box::new(other),predicates,}}

Now show me a query.

letleft=RelExpr::scan("a".into(),vec![0,1]);letright=RelExpr::scan("x".into(),vec![2,3]);letjoin=left.join(right,vec![Expr::col_ref(0).eq(Expr::col_ref(2)),Expr::col_ref(1).eq(Expr::int(100)),Expr::col_ref(3).eq(Expr::int(100)),],);

Running this outputs this:

-> join(@0=@2)
  -> select(@1=100)
    -> scan("a", [0, 1])
  -> select(@3=100)
    -> scan("x", [2, 3])

What happens with the following query?

letjoin=left.join(right,vec![Expr::col_ref(0).eq(Expr::int(100)),Expr::col_ref(1).eq(Expr::int(200)),],);

Let me see.

-> join()
  -> select(@1=200)
    -> select(@0=100)
      -> scan("a", [0, 1])
  -> scan("x", [2, 3])

Oh. Kind of ugly. We need to collapse those Selects. We can intercept calls to select too to flatten them.

fnselect(self,mutpredicates: Vec<Expr>)-> Self{ifletRelExpr::Select{src,predicates: mutpreds,}=self{preds.append(&mutpredicates);returnsrc.select(preds);}RelExpr::Select{src: Box::new(self),predicates,}}
-> join()
  -> select(@0=100 && @1=200)
    -> scan("a", [0, 1])
  -> scan("x", [2, 3])

What else should we do to fully simplify these expressions?

We'd need to do a couple more simplifications to get these trees fully simplified:

  • Selects above Joins should merge their predicate sets,
  • Select should get pushed through Project where possible,
  • Project should get pushed through Join where possible.

Do that.


Apologies to Friedman and Felleisen.


NULL BITMAP is free. If you earn a software engineering salary and think you've gotten some value out of these posts, please consider tossing a couple dollars to the Palestine Children's Relief Fund.


Viewing all articles
Browse latest Browse all 138

Trending Articles