Weblog of Kenny

Mar 15, 2006 at 07:23 o\clock

non-backtracking matching via structural representation

by: luzm

If we were using the universal rep to implement the reg-exp pattern matching


data Lit = Lit String
type Univ = [Lit]

-- for simplicity we just do a pair pattern
match :: [RE] -> Univ -> Maybe [Univ]
match (r1:r2) (l:ls) = if (l,r1') is a monomial of r1
                                then case (match (r1':r2) ls) of -- greedy match
                                        Just [v1:v2] -> Just ((l:v1):v2)
                                        Nothing ->  -- we are too greedy, back track
                                             if () is in r1
                                             then case (match r2 (l:ls)) of
                                                      Just v2 -> Just ([]:v2)
                                                      Nothing -> Nothing
                                else case (match r2 (l:ls)) of
                                                      Just v2 -> Just ([]:v2)
                                                      Nothing -> Nothing

Note that we need to back track because of greediness, e.g.
match [A*,A] [Lit "A"]

we first try to collect Lit "A" into the A*, but we later found that match [A ] [] fails, we back track...


However if we use structural runtime representation, this situation is circumvented.

type:  A*,A+ <: A* ==> A,(A*,A+)|A,A* <: () | A,A* ==>  A,((A*,A+)|A*) <: A,A* ==> ((A*,A+)|A*) <: A* ==>  ...
value: [],(A,[])~ [A] ==>           R (A,[])   ~  R A,[]  ==>    A,(R [])            ~   A,[] ==>           R []  ~ []