After reading
[1] Regular Tree Language Recognition With Static Information A. Frisch ICALP 2005
[2] Type-based Optimization for Regular Patterns M. Y. Levin and B. C. Pierce DBPL 2005
I feel like making a summary and comparison on this topic.
First of all, both works above deal with the problem "Given an input
value, a pattern, how to match the value against the pattern
efficiently?"
The key idea is to make use of the static information on the input
values, that is the input type. For instance, let's take a concretized
example from [1].
f :: (A,C) -> B
f (x as (B,C)) = x
f (x as (A,C)) = B
Naively, the compiled code will first test whether the input value is
matched with (B,C) then followed by another test against (A,C).
However, knowing the static information (input values can only be of
type (A,C). The first test is skipped. Our (XHaskell) implementation
could get this feature for free, because we build on a type driven
translation. In our translation, the pattern is "specialized"
(intersected) by the input type during the type inference phase, before
being translated into ML/Haskell case expression. For instance, the
above program is translated to
f :: (A,C) -> B
f v = case (d2 (A,C) (A,C) v) of
Just vp2 -> ...
Nothing -> ...
The first test can be skipped because we know that statically the precise type of the first pattern is (A,C) \cap (B,C) = \phi.
Another example shows that we will get some other features for free thanks to the type inference.
f :: (A,(D|B)) -> B
f (x as ((A|B),(E|D))) = B
f (x as ((A|D),(F|B))) = B
Exploiting the static information, we know that the two pattern clauses can be merged.
Infer 1st pattern : (A,(D|B)) \cap ((A|B),(E|D)) = (A,D)
Infer 2nd pattern : ((A,(D|B)) - (A,D)) \cap ((A|D),(F|B)) = (A,B)
Hence (A,D) | (A.B) is merged into (A,(D|B)).
These two examples show that some optimizations techniques introduced in [1] are achieved by performing pattern inference.
However we don't get everything for free.
For instance,
f :: ((A,B)|(C,D)) -> ...
f (A,B) = ...
f (C,D) = ...
For simplicity, the pattern variables and the pattern bodies are omitted.
we would compile this program as follows,
f :: (A,B)|(C,D) -> ...
f v = case (d1 v) of
Just vp1 -> ...
Nothing -> case (d2 v) of
Just vp2 -> ....
Nothing -> ...
where
A \leq_di
A
--di is identity
B \leq_di B
--------------------------------------
(A,B) \leq_d3 (A,B)
-------------------------------------
(A,B) \leq_d1 (A,B)|(C,D)
and note that because of the normalization, the derived `downcast'
function is optimized. Namely, in the construction of function d1, when
we know that the first component of the value pair matched with A, we
then only test whether the second component of the value pair matched
with B, and we don't need to bother with D. This is what Frisch
achieved via NUA in [1]. However, in case of a tree value, it troubles
us a lot as compared with his approach. Consider program,
f::((L[A],B)|(L[C],D)) -> ....
f (L[A],B) = ...
f (L[C],D) = ...
It is translated into
f :: ((Lab L A),B)|(Lab L B)) -> ...
f v = case d1 v of
Just vp1 -> ...
Nothing -> case d2 v of
Just vp2 -> ...
Nothing -> ...
where
A \leq_d4 (A|C) B \leq_d5 (B|D)
-----------------------------------------------------------
(L[A],B)|(L[C],D) \leq_u3 (L[A|C],(B|D))
......
(L[A],B) \leq_d3
(L[A|C],(B|D))
--- apply special rule
------------------------------------------------------------
(L[A],B) \leq_d1 (L[A],B)|(L[C],D)
Because we need to deal with containment among trees, which might be
recursively defined, we need to go through a special proof rule which
we mentioned in the paper appears in APLAS 2004.
Due to this special rule, we lose nice static information, because we
now need to first upcast the rhs into a bigger type and downcast it to
the lhs. It results in d4 and d5. If we know the first component of the
value pair is A, still we can't optimize d5 to test the second
component against B.
This
inefficiency is solely introduced by the above special rule, why can't
we just use a rule which is similar to the pair example? We could only
use such a rule, when the lhs of the inequality is finite, i.e. there
is no recursion nor kleene star in the pattern, we are able to "switch
off" the above special rule.
For instance, instead of the rule above, we could say the following
A \leq_d3 A
B \leq_d4 B
------------------------------------------
(L[A],B) \leq_d2 (L[A],B)
------------------------------------------------------------
(L[A],B) \leq_d1 (L[A],B)|(L[C],D)
of course, we need to apply distributivity law to the lhs when the union operator is not at the top most level.
In [2], Levin and Pierce propose that we should not test the the value tree strictly from left to right.
For example,
f :: ((L[B|C],A)|(L[C],B)|(L[B],C)) -> ...
f (L[B|C],A) = ...
f (L[C],B) = ...
f (L[B],C) = ...
The intuition behind is that given an input value
L[v1],v2, we should test v2 against A,B and C to
to decide which pattern it should match, instead of match v1 against
(B|C), C and B, which is not sufficient to determine which pattern is
matched.
Based on my minimal understanding, they seem to look at the branch
factor of the first subtree and the second subtree. in this example,
the branch factor for the first subtree is 2, and the factor for the
second subtree is 3. Therefore, the second subtree is tested first.
This makes sense, because when we layout a search tree for a problem,
we always choose a node which has more children to be the root.
Can we apply this in our settings? From my naive point of view, it is
possible because we could invent a new way to normalize the rhs (the
input type). For instance, there are two ways to normalize the rhs.
((L[B|C],A)|(L[C],B)|(L[B],C)) ~> (L[B],(A|B)) | (L[C],(A|B)) --(1)
((L[B|C],A)|(L[C],B)|(L[B],C)) ~> ((L[B|C],A)|(L[C],B)|(L[B],C)) --(2)
In (1) we group monomials based on first subtree, in (2) we group them
based on second subtree. For sure we will pick (2) because it yields
more groups. Under the same reasoning, we could save some tests.
Let's take one step up, can we generalize this as follows? "Given
patterns p1|...|pn, we want to look for a most common generalization
gi, where p1|...|pn ~> x1,g1,y1 | ... |xn,gn,yn. By examing gi we
could decide which pattern is matched."
If this is valid, we can have an optimized translation as follows
f v = coerce v to (|xi,g,|yi)
test the value under g to match with
g1 -> distribute value to p1
... ->
gn ->
In [2], there is another simple optimization technique is worthwhile
being mentioned. In case a test on monomial, we can skip the test on
the label at run time, because at compile time we already known that
the label must match. For instance
r1' \leq_d1 r1
r2' \leq_d2 r2
--------------------------
L[r1'],r2' \leq_d L[r1],r2
d ((Lab l v),v') = case (d1 v) of
Just u -> case (d2 v) of
Just u' -> Just ((Lab l u),u')
...
We save the test on label l by using a new run-time encoding for label type, data Lab l c = data Lab l c