Friday, June 15, 2007


So I've been getting ready for work by playing with the Facebook web API. I got fed up pretty quickly with the xml interface, so I started using the json. I really like the simplicity of json, particularly for serializing types. I started working on this project in O'Caml, and I was really impressed by Martin Jambon's JSON libraries for O'Caml, json-static . You can just throw in the keyword "json" and it write all the boilerplate code for the serialization. i.e.

type json greeting = Hello | Goodbye

will write code for greeting_of_json and json_of_greeting that do the default conversion, turning Hello to "Hello" and Goodbye to "Goodbye". This is super cool. But, what if you have some weird data type, say

type json weird = {visible: int ; invisible:int}

and you don't want to serialize the invisible type. hmm. Well we can define a special module.

module Weird=
type t = {visible: int ; invisible:int}
let of_json = ...
let to_json = ...

This is nice and all, but ... it is a real hack. O'Caml doesn't have this kind of polymorphism and this only works because of camlp4 magic.

However, this is exactly the kind of thing Haskell can do. So in this entry, I'll try to port this idea to Haskell in a neat way. First, we start with a JSON parser. I'm using the one from JSON.hs. This library uses parsec to do the conversion from strings to a JSON data type and from the type back to pretty-printed strings. All we need to do is auto-generate the boilerplate code to convert from the json type to other types.

So let's go. First we define the basic type class.

>module Data.Json where
>import qualified JSON as J

The Json class has two basic functions. toJson converts data types to a JSON data structure.
fromJson tries to convert a JSON type into a Haskell type. The two list functions are defined
so that we can handle the String newtype. The default implementations should handle these functions in the other cases.

>class Json a where
> toJson :: a -> J.Value
> fromJson :: (Monad m) => J.Value -> m a

These follow the list functions from the prelude.

> listToJson :: [a] -> J.Value
> listToJson = J.Array . map toJson
> listFromJson :: (Monad m) => J.Value -> m [a]
> listFromJson (J.Array a) = mapM fromJson a

Now we define a couple instances for the basic data types.

Some are really basic.

>instance Json Double where
> toJson = J.Number
> fromJson (J.Number a) = return a

A String is just a list of characters.

>instance Json Char where
> toJson = J.String . show
> fromJson (J.String a) = return $ head a
> listToJson = J.String
> listFromJson (J.String a) = return a

Lists just call the default list constructors.

>instance Json a => Json [a] where
> toJson = listToJson
> fromJson = listFromJson

This part was the basic stuff. Now we need to derive instances for arbitrary types. It took me forever to figure out the best way to do this. I started with DRiFT, but I found the interface kind of duct-tapey. Next, I tried SYB, which seemed really neat and perfect for this task. Instead of generating code, it allows you to use folds over the types themselves. The one problem with SYB though is that it is really difficult to incorporate type classes. SYB 3 shows you how to do this, but the implementation is not quite there yet.

So I settled on Data.Derive with Template Haskell. Data.Derive provides a bunch of helper methods for writing derive instances for types, and a ton of ways of inserting the derived instances back into the code. So without further ado, here is the code.

{-#OPTIONS_GHC -fth #-}
>module Data.Derive.Json(makeJson ) where

This is the Data.Derive library

>import Language.Haskell.TH.All
>import Language.Haskell.TH.Lib
>import Data.List
>import Control.Monad (liftM)
>import qualified Data.Map as M
>import qualified JSON as JS
>import qualified Data.Json as DJ

>makeJson :: Derivation
>makeJson = derivationQ json "Json"

For each derived element, we define a toJson and fromJson function.

>json :: Dec-> Q [Dec]
>json dat =
> do
> toClauses <- toJson
> fromJ <- fromJson dat
> return $ simple_instance "Json" dat
> [funN "toJson" toClauses,
> funN "fromJson" fromJ]
> where
> toJson = sequence [sclause [ctp ctr 'x'] `liftM` jsonit ctr |
> ctr <-dataCtors dat]

>litQ :: String -> ExpQ
>litQ = return . lit

This is the toJson code generator function. It generates code like this -

data Test = RecordName {field1::String, field2::Integer}

toJson (RecordName x1 x2) =
JS.Object $ M.fromList $ [("field1",toJson x1),("field2",toJson x2)]

>jsonit:: Con -> ExpQ
>jsonit ctr =
> case ctorFields ctr of
> [] -> [|JS.String $(litQ $ ctorName ctr)|]
> fl -> flds fl
> where
> flds :: [String] -> ExpQ
> flds f = [|JS.Object $ M.fromList $
> $( liftM lst $ mapM field $ zip [1..] f)|]
> field :: (Int,String) -> ExpQ
> field (n,f) = [| ( $(litQ f),
> DJ.toJson $(return $ vrn 'x' n)) |]

This code looks a bit confusing, but if you get by the weird Template Haskell syntax, it is pretty straightforward.

The second piece of code fromJson is a bit harder to understand. The first case is when there is a record. JSON.hs parses JS classes as a map from the string of the field to a JSON value. The obvious way to do this would be -

fromJson (J.Object m) = RecordName{ field1 = fromJson $ M.lookup m "field1";
field2 = fromJson $ M.lookup m "field2"}

However, we need to take into account that lookup or fromJson could fail. This means that we need to put it in a monad. I follow the lead of Data.Map and have fromJson return a value wrapped in a generic monad. To construct the record, we use the do syntax.

f1<-M.lookup m "field1" >>=fromJson
f2<-M.lookup m "field2" >>=fromJson
return $ RecordName{field1=f1;field2=f2}

I'm not sure how to do this with template haskell, so we'll throw out the syntactic sugar for now.

M.lookup m "field1" >>=fromJson >>=
(\f1-> M.lookup m "field2" >>=fromJson >>=
(\f2 -> return $ RecordName{field1=f1;field2=f2}))

We need a couple of tricks to do this. First, we need unique names for the variables, f1 - fn. We do this by using the Q monad built into Template Haskell. One of the cool things that the Q monad lets us do is create unique variables. So the code -- let f = newName "f" ; do {f1<-f;f2<-f;f3<-f; return (f1,f2,f3)} -- will create three unique names.

The other issue we need to handle is to tie the variable names to the fields. We could just choose arbitary names, but let's find a cooler way to do this. If we were writing a compiler, what would we do? Use CPS! If we use CPS we can mimic the lambdas within code inside the Template Haskell. This way we can pass around an environment of bound variables.

>type Env = [(String,Name)]
>type Conti = Env -> ExpQ

>fromJson :: Dec -> Q [Clause]
>fromJson dat = sequence $
> case dataCtors dat of
> [ctr] ->
> [clause [return $ ConP ('JS.Object) [VarP m]] (normalB $ t sd) []] -- fromJson (JS.Object m) =
> where
> m = mkName "m"

{-M.lookup "str" m >>= DJ.fromJson >>= (\ f1. {cont})-}

> t :: Conti -> ExpQ
> t = foldl (.) (\c->c []) $
> map bindOne $ ctorFields ctr
> where
> bindOne str cont env = do
> var <- newName "f"
> [|M.lookup $(litQ str) $(varE m) >>= DJ.fromJson >>=
> $(lamE [varP var] $ (cont ((str,var):env))) |]

{-Construct the record from the environment.
return $ RecordName{field1=f1;field2=f2}-}

> sd a = [| return $(rec a) |]
> where
> pair (s,n) = return (mkName s,VarE n)
> rec = recConE ( mkName $ ctorName ctr) . map pair

The second case just reads in zero argument constructors as strings. Nothing exciting here.

> ctrs ->
> map (\s -> clause [return $ ConP ('JS.String) [LitP $ StringL s]]
> (normalB [|return $(conE $ mkName s)|]) []) $
> map ctorName ctrs

Anyway. I thought this was pretty cool. It is amazing what you can do with a little bit of template code. Gets you out of writing a ton of annoying processing code.

No comments: