Tuesday, June 26, 2007

Search in Haskell

I've been playing around today with search in Haskell, and besides from some overly technical papers and a confusing explanation on the YAHT wiki book, I haven't found anything useful. So I though I'd put up what I could figure out.

First, let's use type classes to get some modularity. Search can be over trees or graphs or whatever, basically we just need states and a function to give us the children of a node. We also are going need to compare states to check if we have reached the goal.

> class (Ord a) => SearchState a where
> children :: a -> [a]

For instance, let's say we are trying to solve the numeric maze given in the haskell wiki. In this problem our state is just an integer.

> newtype NumMaze = NumMaze Integer deriving (Show, Eq, Ord)

And here is the instantiation. Our operations to generate children are (x+2), (x*2), (x/2). So

> instance SearchState NumMaze where
> children (NumMaze n) =
> map NumMaze $
> [n+2, n*2]++[n `div` 2 | 2 `divides` n]
> where divides a b = b `mod` a == 0

Now we can write very generic functions for the search operations. We'll start with a monadic depth first search. Depth first search is very natural for functional programming (if we don't worry about blowing the stack). We dive in with recursion. There is good explanation of this on the haskell wikibook here. Instead of using a graph, we use our search state and simplify the code by using a fold.

> dfs :: (SearchState a,MonadPlus m) => a -> a -> m[a]
> dfs start target
> | start == target = return [start]
> | otherwise =
> foldr mplus mzero $
> [ dfs c target >>= \path -> return (start:path)) |
> c <- children start] The code has two cases. If we have reached the goal, it returns a list of success with the result. Otherwise, in the list comprehension it tries searching each child. If it finds a successful path, it adds it returns it. You might wonder where we handle failure. If there are no children, the fold returns mzero which will propagate failure to its parents. When the parent gets a failure, mplus will try a different path, and we have backtracking. Next we'll try breadth first search. Here is one way to do it by passing a queue as an accumulator. > bfs :: (SearchState a,Monad m) => a -> a -> m [a]
> bfs start target = bfs' [[start]] >>= return . reverse
> where
> bfs' [] = fail "No Path"
> bfs' ((s:r):ns)
> | s == target = return (s:r)
> | otherwise = bfs' (ns ++[c:s:r |c<-children s]) We can make this faster by adding a set to keep track of states that we have already visited. > import Data.Set (empty, notMember, union, fromList)

> bfs2 :: (SearchState a,Monad m) => a -> a -> m [a]
> bfs2 start target = bfs' [[start]] empty >>= return .reverse
> where
> bfs' [] _ = fail "No Path"
> bfs' ((s:r):ns) seen
> | s == target = return (s:r)
> | otherwise = bfs' (ns ++new') seen'
> where
> new' = [c:s:r|
> c <- children s, > c `notMember` seen]
> seen' = seen `union` (fromList $ map head new')

So these functions are nice, and they are how I would write them in O'Caml. But they just feel too ugly for haskell. I know, I'm getting spoiled. So I thought I would give a shot at a lazier version. The rest of this entry is my very basic understanding of the Functional Pearl Breadth-First Combinators.

First, depth first search. I visualize it like this. Imagine, that the states form a tree, and we want to first transform the tree to a stream, and then process it linearly. Haskell makes this absurdly easy. First, imagine that we can linearize all the children searches. This gives us a matrix-like list of streams (since we are doing DFS, we hope they are not infinite, but that does not break the formulation) i.e.

[[Child1, Child1.1, Child1.2 ...],
[Child2, Child2.1, ...],
...
]

Since this is DFS, we want to explore these in order. This means all we need to do in concat them, and we have the full traversal. Bam!

> lazyDFT :: (SearchState a) => a -> [a]
> lazyDFT start = start : (concat $ map lazyDFT $ children start)

In the paper they call this the "and" operator for logical combination.

Now, what about Breadth-first search. Well we want to move one level at a time instead of following a single path to its end. If we know that our tree is balanced, we can simply transpose the matrix and then take the concat

[[Child1, Child1.1, Child1.2 ...],
[Child2, Child2.1, ...],
...
]
-->
[[Child1, Child2, Child3 ...],
[Child1.1, Child2.1, ...],
...
]

Here is the code using Data.List's lazy transpose function.

> lazyBFTBal :: (SearchState a) => a -> [a]
> lazyBFTBal start =
> start:(concat $ transpose $ map lazyBFTBal $ children start)

Hmm... Not sure the best way to proceed here. In all honesty, I find the rest of the combinators paper to be a bit confusing to say the least, but I'll try to fake it. I want to make sure I traverse the tree in order, so I'll preserve the order at each level. The new type signature is .

lazyBFT :: (SearchState a) => a -> [[a]]

and the matrix looks like,

[[[Child1], [Child1.1, Child1.2] ...],
[[Child2], [Child2.1], [Child2.1.1]],
...
]
transposed to
[[[Child1], [Child2] ...],
[[Child1.1, Child1.2], [Child2.1],...],
[ [Child2.1.1],..]
...
]

and then we concat each row,

[[Child1,Child2,...], [Child1.1,Child1.2],[Child2.1.1]]

Score! Back to where we started.

The code looks like,

> lazyBFT start = [start]:(map concat $ transpose $ map lazyBFT $ children start)

If I use it on the problem we started with yields.

*Main> take 3 $ lazyBFT (NumMaze 10)
[[NumMaze 10],[NumMaze 12,NumMaze 5,NumMaze 20],[NumMaze 14,NumMaze 6,NumMaze 24,NumMaze 7,NumMaze 10,NumMaze 22,NumMaze 10,NumMaze 40]]

Now lets add some of the pruning back.

> lazyBFT2 seen start = [start]:(map concat $ transpose $ map (lazyBFT2 seen') $ c)
> where c = [c| c <- children start , c `notMember` seen] > seen' = seen `union` (fromList c)

*Main> take 3 $ lazyBFT2 (singleton $ NumMaze 10) (NumMaze 10)
[[NumMaze 10],[NumMaze 12,NumMaze 5,NumMaze 20],[NumMaze 14,NumMaze 6,NumMaze 24,NumMaze 7,NumMaze 22,NumMaze 40]]



Okay, I'll stop here for now. Pretty cool stuff. I wonder, is there is a lazy way to do this pruning? Maybe next post...

Friday, June 15, 2007

JSON-Static

So I've been getting ready for work by playing with the Facebook web API. I got fed up pretty quickly with the xml interface, so I started using the json. I really like the simplicity of json, particularly for serializing types. I started working on this project in O'Caml, and I was really impressed by Martin Jambon's JSON libraries for O'Caml, json-static . You can just throw in the keyword "json" and it write all the boilerplate code for the serialization. i.e.

type json greeting = Hello | Goodbye

will write code for greeting_of_json and json_of_greeting that do the default conversion, turning Hello to "Hello" and Goodbye to "Goodbye". This is super cool. But, what if you have some weird data type, say

type json weird = {visible: int ; invisible:int}

and you don't want to serialize the invisible type. hmm. Well we can define a special module.

module Weird=
struct
type t = {visible: int ; invisible:int}
let of_json = ...
let to_json = ...
end

This is nice and all, but ... it is a real hack. O'Caml doesn't have this kind of polymorphism and this only works because of camlp4 magic.

However, this is exactly the kind of thing Haskell can do. So in this entry, I'll try to port this idea to Haskell in a neat way. First, we start with a JSON parser. I'm using the one from JSON.hs. This library uses parsec to do the conversion from strings to a JSON data type and from the type back to pretty-printed strings. All we need to do is auto-generate the boilerplate code to convert from the json type to other types.

So let's go. First we define the basic type class.


>module Data.Json where
>import qualified JSON as J

The Json class has two basic functions. toJson converts data types to a JSON data structure.
fromJson tries to convert a JSON type into a Haskell type. The two list functions are defined
so that we can handle the String newtype. The default implementations should handle these functions in the other cases.

>class Json a where
> toJson :: a -> J.Value
> fromJson :: (Monad m) => J.Value -> m a

These follow the list functions from the prelude.

> listToJson :: [a] -> J.Value
> listToJson = J.Array . map toJson
> listFromJson :: (Monad m) => J.Value -> m [a]
> listFromJson (J.Array a) = mapM fromJson a

Now we define a couple instances for the basic data types.

Some are really basic.

>instance Json Double where
> toJson = J.Number
> fromJson (J.Number a) = return a

A String is just a list of characters.

>instance Json Char where
> toJson = J.String . show
> fromJson (J.String a) = return $ head a
> listToJson = J.String
> listFromJson (J.String a) = return a

Lists just call the default list constructors.

>instance Json a => Json [a] where
> toJson = listToJson
> fromJson = listFromJson


This part was the basic stuff. Now we need to derive instances for arbitrary types. It took me forever to figure out the best way to do this. I started with DRiFT, but I found the interface kind of duct-tapey. Next, I tried SYB, which seemed really neat and perfect for this task. Instead of generating code, it allows you to use folds over the types themselves. The one problem with SYB though is that it is really difficult to incorporate type classes. SYB 3 shows you how to do this, but the implementation is not quite there yet.

So I settled on Data.Derive with Template Haskell. Data.Derive provides a bunch of helper methods for writing derive instances for types, and a ton of ways of inserting the derived instances back into the code. So without further ado, here is the code.

{-#OPTIONS_GHC -fth #-}
>module Data.Derive.Json(makeJson ) where

This is the Data.Derive library

>import Language.Haskell.TH.All
>import Language.Haskell.TH.Lib
>import Data.List
>import Control.Monad (liftM)
>import qualified Data.Map as M
>import qualified JSON as JS
>import qualified Data.Json as DJ



>makeJson :: Derivation
>makeJson = derivationQ json "Json"

For each derived element, we define a toJson and fromJson function.

>json :: Dec-> Q [Dec]
>json dat =
> do
> toClauses <- toJson
> fromJ <- fromJson dat
> return $ simple_instance "Json" dat
> [funN "toJson" toClauses,
> funN "fromJson" fromJ]
> where
> toJson = sequence [sclause [ctp ctr 'x'] `liftM` jsonit ctr |
> ctr <-dataCtors dat]

>litQ :: String -> ExpQ
>litQ = return . lit


This is the toJson code generator function. It generates code like this -

data Test = RecordName {field1::String, field2::Integer}

toJson (RecordName x1 x2) =
JS.Object $ M.fromList $ [("field1",toJson x1),("field2",toJson x2)]


>jsonit:: Con -> ExpQ
>jsonit ctr =
> case ctorFields ctr of
> [] -> [|JS.String $(litQ $ ctorName ctr)|]
> fl -> flds fl
> where
> flds :: [String] -> ExpQ
> flds f = [|JS.Object $ M.fromList $
> $( liftM lst $ mapM field $ zip [1..] f)|]
> field :: (Int,String) -> ExpQ
> field (n,f) = [| ( $(litQ f),
> DJ.toJson $(return $ vrn 'x' n)) |]

This code looks a bit confusing, but if you get by the weird Template Haskell syntax, it is pretty straightforward.

The second piece of code fromJson is a bit harder to understand. The first case is when there is a record. JSON.hs parses JS classes as a map from the string of the field to a JSON value. The obvious way to do this would be -

fromJson (J.Object m) = RecordName{ field1 = fromJson $ M.lookup m "field1";
field2 = fromJson $ M.lookup m "field2"}

However, we need to take into account that lookup or fromJson could fail. This means that we need to put it in a monad. I follow the lead of Data.Map and have fromJson return a value wrapped in a generic monad. To construct the record, we use the do syntax.

do
f1<-M.lookup m "field1" >>=fromJson
f2<-M.lookup m "field2" >>=fromJson
return $ RecordName{field1=f1;field2=f2}

I'm not sure how to do this with template haskell, so we'll throw out the syntactic sugar for now.

M.lookup m "field1" >>=fromJson >>=
(\f1-> M.lookup m "field2" >>=fromJson >>=
(\f2 -> return $ RecordName{field1=f1;field2=f2}))

We need a couple of tricks to do this. First, we need unique names for the variables, f1 - fn. We do this by using the Q monad built into Template Haskell. One of the cool things that the Q monad lets us do is create unique variables. So the code -- let f = newName "f" ; do {f1<-f;f2<-f;f3<-f; return (f1,f2,f3)} -- will create three unique names.

The other issue we need to handle is to tie the variable names to the fields. We could just choose arbitary names, but let's find a cooler way to do this. If we were writing a compiler, what would we do? Use CPS! If we use CPS we can mimic the lambdas within code inside the Template Haskell. This way we can pass around an environment of bound variables.

>type Env = [(String,Name)]
>type Conti = Env -> ExpQ

>fromJson :: Dec -> Q [Clause]
>fromJson dat = sequence $
> case dataCtors dat of
> [ctr] ->
> [clause [return $ ConP ('JS.Object) [VarP m]] (normalB $ t sd) []] -- fromJson (JS.Object m) =
> where
> m = mkName "m"

{-M.lookup "str" m >>= DJ.fromJson >>= (\ f1. {cont})-}

> t :: Conti -> ExpQ
> t = foldl (.) (\c->c []) $
> map bindOne $ ctorFields ctr
> where
> bindOne str cont env = do
> var <- newName "f"
> [|M.lookup $(litQ str) $(varE m) >>= DJ.fromJson >>=
> $(lamE [varP var] $ (cont ((str,var):env))) |]

{-Construct the record from the environment.
return $ RecordName{field1=f1;field2=f2}-}

> sd a = [| return $(rec a) |]
> where
> pair (s,n) = return (mkName s,VarE n)
> rec = recConE ( mkName $ ctorName ctr) . map pair

The second case just reads in zero argument constructors as strings. Nothing exciting here.

> ctrs ->
> map (\s -> clause [return $ ConP ('JS.String) [LitP $ StringL s]]
> (normalB [|return $(conE $ mkName s)|]) []) $
> map ctorName ctrs

Anyway. I thought this was pretty cool. It is amazing what you can do with a little bit of template code. Gets you out of writing a ton of annoying processing code.