Weblog of Kenny

Oct 20, 2006 at 10:07 o\clock

BiXid Bidirectional XML transformation.

by: luzm

Their key idea is based on "program by relation" paradigm. We recall their example converting Netscape book and XBel bookmark in the paper as follows,

relation top =

Html [ Head [String], Body [H1 [ t :: String ] , Dl [nc] ] ]

<->

Xbel [ Title [ t :: String ], xc ]

where contents (nc, xc)

relation contents =
( nb | nf )*
<->
( xb | xf )*
where bookmark(nb,xb), folder(nf, xf)

-- note that the pattern matching assumes indeterministic matching. nb and nf or xb or xf can be in any order.

relation bookmark =
( Dt [ A[Href [ url :: String], title ::String]] )
<->
(Bookmark [Href [url :: String],Title[ title ::String]] ] )

relation folder =
Dt [H3 [title :: String], Dl [ nc] ]
<->
Folder[Title [title ::String], xc ]
where contents (nc, xc)

In the above example, top describes nestscape and XBel bookmarks are in relation, if the H1 from Hthml is the same as the title of Xbel, and their contents are in relation. contents relation ties the nescape bookmark content with the xbel bookmark contents. Now that the nb and nf, xb and xf can be in any order. This assume the values matches with the pattern any order. nb and xb must be in bookmark relation. and xf and nf must be in folder relation. There is a recursion in the folder relation.

  The semantics is described as follows in our own sytax. We only consider the forward transformation, the backward transformation is dual to it.

for relation

relation r = P <-> Q  where W

we have two phases:

parsing phase:  W |- v in P => venv 

value v matches with pattern P under the extra relation W, producing value binding venv.

Note that venv contains all possible matching. For instamce

A in (x :: A?, y :: A?) => { x -> A, y -> () } or  { x -> (), y -> A }.

unparsing phase: W | - venv <= Q ni v'

that is under W, value binding venv unparses Q producing value v'.

Technical aspects:

They use parse automata which is close to the regular expression tree automata with variable from XDuce's simpler design paper. 

Mar 15, 2006 at 07:23 o\clock

non-backtracking matching via structural representation

by: luzm

If we were using the universal rep to implement the reg-exp pattern matching


data Lit = Lit String
type Univ = [Lit]

-- for simplicity we just do a pair pattern
match :: [RE] -> Univ -> Maybe [Univ]
match (r1:r2) (l:ls) = if (l,r1') is a monomial of r1
                                then case (match (r1':r2) ls) of -- greedy match
                                        Just [v1:v2] -> Just ((l:v1):v2)
                                        Nothing ->  -- we are too greedy, back track
                                             if () is in r1
                                             then case (match r2 (l:ls)) of
                                                      Just v2 -> Just ([]:v2)
                                                      Nothing -> Nothing
                                else case (match r2 (l:ls)) of
                                                      Just v2 -> Just ([]:v2)
                                                      Nothing -> Nothing

Note that we need to back track because of greediness, e.g.
match [A*,A] [Lit "A"]

we first try to collect Lit "A" into the A*, but we later found that match [A ] [] fails, we back track...


However if we use structural runtime representation, this situation is circumvented.

type:  A*,A+ <: A* ==> A,(A*,A+)|A,A* <: () | A,A* ==>  A,((A*,A+)|A*) <: A,A* ==> ((A*,A+)|A*) <: A* ==>  ...
value: [],(A,[])~ [A] ==>           R (A,[])   ~  R A,[]  ==>    A,(R [])            ~   A,[] ==>           R []  ~ []                                   
             


Feb 24, 2006 at 11:17 o\clock

PiDuce

by: luzm

Summary for paper "A basic contract language for web services"

The authors have proposed a schema language to describe webservices. The essence is similar to CDuce's type with the following relation on channels.
<R>i means a channel receives R as input.
<R>o means a channel produces R as output.
<R>io means a channel receives and produces R.

So <R>i = R -> _|_, <R>o = Any -> R and <R>io= R -> R

The problem defined in this paper is that validating a webservice request (a value) w.r.t to the schema happens at run time. Because channel does not carry structural information but type annotation, subtyping check must be performed at run time, which is EXP.

The solution is to reconsider a subset of regular hedges, the rough idea is to disallow A*,A*. If we normalize it, we get A,A*,A*| A,A* | (), since the first two monomails shares the same leading literal, with antimirov's algorithm we would combine the two. But this leads to exponential blow up the size of the proof. It is worse if regular hedge is considered. So the authors decided to disallow this kind of type. Assume the subset should be deterministic by normalization without combining and monomials. (Though this is my rough understanding.)
TODO: check whether their definition is complete!!!

They have a nice result showing that considering this subset of type, would produce an efficient proving algorithm in polynormial time.





Dec 6, 2005 at 10:04 o\clock

Design Decisions on polymorphic XML transformation. Part 2

by: luzm

I continue the write up in this new entry.

It seems that

1. Using indeterministic maching policy greatly simplifies the type system.
2. Programmer supplies explicit type information when type application is ambiguous


Here are some useful examples using polymorphic regular expression types.


Example 1.

map :: (a ->b) -> a* -> b*
map f :: (a->b) = let g :: a* -> b*
                              g (x :: a, xs :: a*) = ((f x), (map f xs))
                              g () = ()
                          in  g


Example 2.

filter :: (a|b)* -> a*
filter xs = let f :: (a|b) -> a?
                    f (x :: a) = x
                    f (x :: b) = ()
                in map f xs

Example 3.

zip :: [a]+ -> [a+]
zip (x :: [a], y :: [a], z :: [a]*) = zip ((prelude.zip x y),z)
zip (x :: [a])                         = x

d

Nov 25, 2005 at 08:54 o\clock

Design Decisions on polymorphic XML transformation.

by: luzm

I summarize the issues and design decisions as follows.

Issue 1: Ambiguous Instantiation.

regtype A = A[]
regtype B = B[]
f :: (a,B,b) -> b
f (x :: (a,B), y :: b) = y

assume we apply f to value of type (B,B,A). There are at least two  "ambiguous" type instances: (1) a = (), b = (B,A), and (2) a = B, b = A.

It either leads to loss of coherence or type unsoundness.

Design Decision 1:
 a) Perform ambiguous check on type annotations. (Employed by  Hosoya et al. POPL 2005)
 b) Type application is provided by the programmer. (Employed by Vouillon POPL 2006) and Sulzmann and Lu PLAN-X 2006)

Issue 2: Pattern Inference.

f :: a* -> ()
f (x :: a+) = ()
f (x :: a*) = ()                --(2)

If we assume patterns are processed from top
to bottom, we can conclude that
type environment for pattern (2) is x::().

Design Decision 2:
 a) Perform no pattern inference, and takes the pattern annotation litterally.
   Pro :  it is simple to translation, it should work for most of the realistic examples (Really?)
   Con: It rejects some programs. The programmer needs to provide proper pattern annotations.
 b) Perform a conservative inference.
   Pro: It is not hard to extend intersection and difference to handle polymorphic variables. The system is sound.
   Con: The system is incomplete. In case of inference failure, let the unsolvable constraints flow out as part of the function type annotations. It requires the programmere to provide addtional information to satisfy the constraints. And the generated constraints potentially lead to context-free solution.

Issue 3: Inference under (In)deterministic matching policy.
Inference w.r.t a deterministic matching policy helps to type more programs.

f :: a* -> ()
f (x::a*, y ::a*) = y

Design Decision 3: inference under deterministic
Pro: as mentioned above.
Con:  It blows up the type annotation. In case of type error, the constraints might not be managible by the programmer.

Issue 4: Exhaustiveness Check
It is undecidable in general.

Design decision: Simple conservative check.
  f :: r1 -> r2
  f [pi = ei]
r1 <= \Cup \stript(pi)

Cons:  The algorithm is incomplete.  We might reject useful programs, (we need to compare this against Hosoya's approach).  Otherwise, we let the unsatisfiable constraints flow out as annotation. I

Question : n case of type error, are the constraints managible by the programmers?