Hi everybody, apologies, I didn't have time to put together a newsletter this week. In lieu of that, I found this chapter of a book called "The Little Planner" in a trash can in Berkeley, California and have done my best to get the dirt off of it. I'm not sure where it came from, but I hope it's an adequate substitute.
What's an Expr
?
It is an expression over scalar values, like numbers.
What's the definition of Expr
we've been using so far?
It's this:
enumExpr{ColRef{id: usize},Int{val: i64},Eq{left: Box<Expr>,right: Box<Expr>},}
What's a "free variable?"
It is a column in an expression that is not bound.
You're kicking the can down the road: what is a "bound variable," then?
It is a column that is given its values within an expression and is not a parameter that comes from some other context.
Can you give me an example?
In the JavaScript expression,
function(a){returna+b;}
a
is bound, because it doesn't refer to anything outside of the scope of the expression itself, but b
is free.
If expr
is an Expr
, what is expr.free()
?
It is the set of free variables in expr
.
Implement Expr::free
.
It's
fnfree(&self)-> HashSet<usize>{matchself{Expr::ColRef{id}=>{letmutset=HashSet::new();set.insert(*id);set}Expr::Int{..}=>HashSet::new(),Expr::Eq{left,right}=>{letmutset=left.free();set.extend(right.free());set}}}
What's a RelExpr
?
It's an expression over relational values, like scans, filters, and joins.
What's the definition of RelExpr
we've been using so far?
enumRelExpr{Scan{table_name: String,column_names: Vec<usize>,},Select{src: Box<RelExpr>,predicates: Vec<Expr>,},Join{left: Box<RelExpr>,right: Box<RelExpr>,predicates: Vec<Expr>,},Project{src: Box<RelExpr>,cols: HashSet<usize>,},}
implRelExpr{fnscan(table_name: String,column_names: Vec<usize>)-> Self{RelExpr::Scan{table_name,column_names,}}fnselect(self,predicates: Vec<Expr>)-> Self{RelExpr::Select{src: Box::new(self),predicates,}}fnjoin(self,other: Self,predicates: Vec<Expr>)-> Self{RelExpr::Join{left: Box::new(self),right: Box::new(other),predicates,}}fnproject(self,cols: HashSet<usize>)-> Self{RelExpr::Project{src: Box::new(self),cols,}}}
What is the attribute set of a relational expression?
It is the set of columns that each row in the evaluated expression has.
If rel
is a RelExpr
, what's rel.att()
?
It is the attribute set of the RelExpr
.
Show me its implementation.
fnatt(&self)-> HashSet<usize>{matchself{RelExpr::Scan{column_names,..}=>column_names.iter().cloned().collect(),RelExpr::Select{src,..}=>src.att(),RelExpr::Join{left,right,..}=>{letmutset=left.att();set.extend(right.att());set}RelExpr::Project{cols,..}=>cols.clone(),}}
Are "free variables" and "attribute sets" similar?
They're both sets of columns that we derive from an expression.
How are they different?
Well, "free variables" represent the sets of columns we need, and "attribute sets" represent the sets of columns we have.
Another difference is that "free variables" come from scalar expressions, and "attribute sets" come from relational expressions.
Could relational expressions have free variables?
Yes, if that relational expression itself had scalar expressions that referenced columns not produced by the relational expression.
Could scalar expressions have attribute sets?
No, since an attribute set is about the set of columns that are produced specifically by a relational expression.
What does it mean for an Expr
to be "bound by" a RelExpr
?
An Expr
is bound by a RelExpr
if all of its free variables are part of the RelExpr
's attribute set.
True or false, @0
is bound by scan("a", [0])
.
True, since scan("a", [0])
produces the column 0
, it gives us sufficient context to evaluate @0
.
True or false, @0
is bound by join([], scan("a", [0]), scan("x", [1]))
.
True, since a join's attribute set is the union of its inputs' attribute sets, we have the context necessary to evaluate @0
.
True or false, @0
is bound by project([1], scan("a", [0, 1]))
.
False, since the project
will throw away the column named 0
, we won't have access to it and thus cannot evaluate the expression @0
.
Why is that important?
When we attempt to evaluate the scalar expression, we do so in the context of a row. We can only evaluate a scalar expression if the row we have has values for all of the columns we reference.
What's a join?
It's the cross product of two relations, filtered according to some predicates.
Can you show me a join?
letleft=RelExpr::scan("a".into(),vec![0,1]);letright=RelExpr::scan("x".into(),vec![2,3]);letjoin=left.join(right,vec![Expr::col_ref(0).eq(Expr::col_ref(2)),Expr::col_ref(1).eq(Expr::int(100)),],);
This gives a query plan that looks something like this:
-> join(@0=@2 && @1=100)
-> scan("a", [0, 1])
-> scan("x", [2, 3])
Explain what this join does.
It looks at every pair of rows from a
and x
,
and emits the ones that satisfy both a.0=x.2
and a.1=100
.
How do those predicates differ?
The first one, a.0=x.2
, requires both inputs to the join to be bound, but the second one, a.1=100
is bound by only a
.
What does that mean?
We don't need the values from x
to evaluate if a row satisfies a.1=100
.
Does that suggest anything we could do to this query?
We could execute that predicate before doing the join. We could write it like this:
letjoin=left.select(vec![Expr::col_ref(1).eq(Expr::int(100))]).join(right,vec![Expr::col_ref(0).eq(Expr::col_ref(2))]);
For a query plan that looks like this:
-> join(@0=@2)
-> select(@1=100)
-> scan("a", [0, 1])
-> scan("x", [2, 3])
Can we automate this process?
Probably.
How might we do that?
I guess we'd have to check if each predicate is bound by either side of the join alone.
Write bound_by
.
implExpr{fnbound_by(&self,rel: &RelExpr)-> bool{self.free().is_subset(&rel.att())}}
How can we use this?
We can intercept calls to join
:
fnjoin(self,other: Self,mutpredicates: Vec<Expr>)-> Self{foriin0..predicates.len(){ifpredicates[i].bound_by(&self){// We can push this predicate down.letpredicate=predicates.swap_remove(i);returnself.select(vec![predicate]).join(other,predicates);}ifpredicates[i].bound_by(&other){// We can push this predicate down.letpredicate=predicates.swap_remove(i);returnself.join(other.select(vec![predicate]),predicates);}}RelExpr::Join{left: Box::new(self),right: Box::new(other),predicates,}}
Now show me a query.
letleft=RelExpr::scan("a".into(),vec![0,1]);letright=RelExpr::scan("x".into(),vec![2,3]);letjoin=left.join(right,vec![Expr::col_ref(0).eq(Expr::col_ref(2)),Expr::col_ref(1).eq(Expr::int(100)),Expr::col_ref(3).eq(Expr::int(100)),],);
Running this outputs this:
-> join(@0=@2)
-> select(@1=100)
-> scan("a", [0, 1])
-> select(@3=100)
-> scan("x", [2, 3])
What happens with the following query?
letjoin=left.join(right,vec![Expr::col_ref(0).eq(Expr::int(100)),Expr::col_ref(1).eq(Expr::int(200)),],);
Let me see.
-> join()
-> select(@1=200)
-> select(@0=100)
-> scan("a", [0, 1])
-> scan("x", [2, 3])
Oh. Kind of ugly. We need to collapse those Select
s. We can intercept calls to select
too to flatten them.
fnselect(self,mutpredicates: Vec<Expr>)-> Self{ifletRelExpr::Select{src,predicates: mutpreds,}=self{preds.append(&mutpredicates);returnsrc.select(preds);}RelExpr::Select{src: Box::new(self),predicates,}}
-> join()
-> select(@0=100 && @1=200)
-> scan("a", [0, 1])
-> scan("x", [2, 3])
What else should we do to fully simplify these expressions?
We'd need to do a couple more simplifications to get these trees fully simplified:
Select
s aboveJoin
s should merge their predicate sets,Select
should get pushed throughProject
where possible,Project
should get pushed throughJoin
where possible.
Do that.
Apologies to Friedman and Felleisen.
NULL BITMAP is free. If you earn a software engineering salary and think you've gotten some value out of these posts, please consider tossing a couple dollars to the Palestine Children's Relief Fund.