JSON Autotype

Introduction

Static typing promises safer code, and easier scaling to large projects. Polymorphic types with classes promise high level of abstraction typical of scripting languages, and at the same time strong discipline.

However not all our data is initially typed. Most common data format used by web apps is dynamically typed JSON format. Web API descriptions are usually just free form text with examples given as JSON documents.

How one is supposed to type that?

JSON-Autotype project promises to solve this problem automatically. It takes a couple of JSON documents and produces valid type declaration. It also provides a parser and printer for this JSON format, and is guaranteed that sample inputs will always parse.

It is also much faster than inferring the structure of JSON documents by your own human eyes. Some of our users have used it on megabytes and gigabytes of documents^[Please contact us, if you are interested in funding work on scaling JSON parsing to large documents without compromising type safety, and with minimal change of interface.].

Example

Take an example JSON document:

{
    "colorsArray":[{
            "colorName":"red",
            "hexValue":"#f00"
        },
        {
            "colorName":"green",
            "hexValue":"#0f0"
        },
        {
            "colorName":"blue",
            "hexValue":"#00f"
        }
    ]
}

It will be used to generate the following Haskell^[Or Elm. Please let us know if you are interested to fund work on other languages like PureScript, Scala, or F#.] type declaration:

data ColorsArray = ColorsArray {
    colorsArrayHexValue  :: Text,
    colorsArrayColorName :: Text
  } deriving (Show,Eq)

data TopLevel = TopLevel {
    topLevelColorsArray :: ColorsArray
  } deriving (Show,Eq)

Consider a case of ambiguous types, like the following:

{
    "parameter":[{
            "parameterName" :"apiVersion",
            "parameterValue":1
        },
        {
            "parameterName" :"failOnWarnings",
            "parameterValue":false
        },
        {
            "parameterName" :"caller",
            "parameterValue":"site API"
        }]
}

Typing uses :|: union type (similar to Either but without a tag):

data Parameter = Parameter {
    parameterParameterValue :: Bool :|: Int :|: Text,
    parameterParameterName  :: Text
  }

data TopLevel = TopLevel {
    topLevelParameter :: Parameter
  }

Theory behind the machine

We use union type system without polymorphic variables to infer types of JSON documents. It translates each value into a type:

data Type =
    TBool | TString | TInt | TDouble
  | TNull
  | TUnion (Set Type)
  | TObj        Dict
  | TArray      Type

Then it uses TUnion of a set of types to reconcile values that sometimes have different primitive JSON types. (TObj represents dictionary object.) Then it uses unification to reconcile types between different values. Another successful use of mathematical theory to simplify practical software problem.

References

Details are described in the presentation for Engineers.SG and Monadic Warsaw. But you may prefer to look at the source code yourself!

Other articles about JSON Autotype:

  1. 24 days of Hackage 2015
  2. 24 dias de Hackage 2015
  3. Aelve guide to JSON lists JSON Autotype as alternative instance generator.