All Unkept
Posted in: Python, Haskell  —  November 14, 2012 at 12:23 AM

Dynamic typing in a statically typed language

by Luke Plant

A recent question on programmers.stackexchange.com asked What functionality does dynamic typing allow?

I thought one of the best short answers to this was from Mark Ransom:

Theoretically there's nothing you can't do in either, as long as the languages are Turing Complete. The more interesting question to me is what's easy or natural in one vs. the other.

This post is about providing an example to back that up, and to respond to people who claim that, since you can implement dynamic types in a statically typed language, statically typed languages give you all the benefits of dynamically typed languages.

[Edit: to those who think I'm being a language or dynamic typing advocate or engaging in any kind of bashing, please read that last paragraph again, and note especially the use of word 'all'.]

Let's set up a problem. It's made up, but it illustrates the point I want to make:

Given a file, 'invoices.yaml', take the first document in it, extract the 'bill-to' field, and save the data in it as JSON in an output file 'address.json'. You can take it for granted that the contents of that field can be serialised as JSON (e.g. doesn't contain dates), although that might not be true for the rest of the document. To keep the example focussed and simple, everything will be ASCII.

The particular YAML file I used was taken from an example YAML document I found on the web, and then expanded for the sake of illustration:

---
invoice: 34843
date   : 2001-01-23
bill-to:
    given  : Chris
    family : Dumars
    address:
        lines: |
            458 Walkman Dr.
            Suite #292
        city    : Royal Oak
        state   : MI
        postal  : 48046
---
invoice: 34844
date   : 2001-01-24
bill-to:
    given  : Pete
    family : Smith
    address:
        lines: |
            3 Amian Rd
        city    : Royal Oak
        state   : MI
        postal  : 48047

I'll use Python and Haskell as representatives of dynamic typing and static typing, because I know them and many would consider them to be very good representatives of their camps, and I'm a big fan of both languages.

I also think that examining any programming problem in the abstract, or with respect to ideas like ‘dynamic typing’ or ‘static typing’, is not very relevant, because in the real world we have to use real, concrete languages, and they come with a whole set of properties (in terms of the language definition, tool sets, communities and libraries) that make a massive impact on how you actually use them.

So I'm going to try to use real libraries that actually exist, ignore solutions that could theoretically exist but don't, and ignore problems that could theoretically exist but don't.

Python

Here is my Python solution:

import yaml
import json
json.dump(list(yaml.load_all(open('invoices.yaml')))[0]['bill-to'],
          open('address.json', 'w'))

Notes: I didn't have to consult docs once. This isn't just due to my familiarity with Python — it's also the fact that I can fire up IPython and go:

In [1]: import yaml
In [2]: yaml.<TAB>

and get a list of likely functions. I can then go:

In [3]: yaml.load_all?

and get help, or go:

In [4]: yaml.load_all??

and get the complete source code of the function/method/class/module, in case I need it.

Haskell

Now for the Haskell version. First, a disclaimer: I'm much less experienced in Haskell than in Python. I did manage to write my blog software in Haskell at one point, but I don't use Haskell on anything like a daily basis, and I do use Python that much.

I first need to parse YAML. I've got a choice of packages. Unlike in Python, for a library like this, the choice you make is likely to have a big impact on the code you write — switching to a different (perhaps faster) package won't be just a case of changing an import, as we will see. The choice of packages represents the fact that even designing how this thing should work in terms of API and data structures is not straightforward in Haskell, and represents a much bigger commitment, and therefore problem, for the library user. In Python, while there are a few API choices (like supporting streaming or not, potentially), mostly it's pretty obvious how the library should work.

Looking on Hackage, I first find the 'yaml' package. The first line of the Data.Yaml API docs reads:

A JSON value represented as a Haskell value.

(Yes, you read that right). This doesn't look good. The whole file has stuff about JSON, not YAML, with no indication why I want to be using JSON values, not YAML. But I have a go anyway, perhaps it was deliberate.

When trying to use the decodeFile function, I get an error about needing a type signature, due to the way decodeFile is defined:

decodeFile :: FromJSON a => FilePath -> IO (Maybe a)

There are lots of instances of FromJSON to choose from, but I have to know in advance the type of data. And it looks like I've got data that isn't going to fit into any of those types, because it involves heterogenous collections. [Correction in comments, see below].

I gave up and tried another package - Data.Yaml.Syck.

First try:

import Data.Yaml.Syck

main = do
  d <- parseYamlFile "invoices.yaml"
  print d

This works - well, I've got some kind of parsing going on, at least. It looks like I've got some YamlNode datastructure, and the top thing is an EMap (it looks like it has only parsed the first document, which is worrying, but doesn't matter given my requirements, so I'll ignore that). But how do I get data out?

OK, let's try yaml-light - it wraps HsSyck and has some easier utility functions, like lookupYL.:

lookupYL :: YamlLight -> YamlLight -> Maybe YamlLight

That expects the lookup key to be a YamlLight, so I need to create one from a string, somehow. The docs show how to turn a ByteString into a YamlLight node, and I need to pass in a String, which from previous experience requires doing something like pack from Data.ByteString.

My program so far:

import Data.Yaml.YamlLight
import Data.ByteString.Char8 (pack)
import Data.Maybe

main = do
  d <- parseYamlFile "invoices.yaml"
  print $ fromJust $ lookupYL (YStr $ pack "bill-to") d

Which gives this output:

YMap (fromList [(YStr "bill-to",YMap (fromList [(YStr "address",YMap (fromList [(YStr "city",YStr "Royal Oak"),(YStr "lines",YStr "458 Walkman Dr.\nSuite #292\n"),(YStr "postal",YStr "48046"),(YStr "state",YStr "MI")])),(YStr "family",YStr "Dumars"),(YStr "given",YStr "Chris")])),(YStr "date",YStr "2001-01-23"),(YStr "invoice",YStr "34843")])

Now I have to dump to JSON. From a Python perspective, all I want is a function that can take some ‘native values’ and dump them to JSON, like the Python json.dump function. But every piece of data in my data structure is wrapped in things like YStr and YMap.

In addition, though I can see the structure of my data in front of me, the requirements I've been given don't make guarantees that it will stay the same, just that it can be converted to JSON. I need a routine that will convert anything YAML to the equivalent in JSON, where that is possible.

It looks like I could create a JSON instance for YamlLight, so that the encode function I want to use (which dumps JSON to a string) could take YamlLight as an input directly. I end up with this:

import Data.Yaml.YamlLight (parseYamlFile, lookupYL, YamlLight(..), unStr)
import Data.ByteString.Char8 (pack, unpack)
import Text.JSON (JSON(..), encode, JSValue(..), toJSString, toJSObject)
import Data.Maybe (fromJust)
import Data.Map (toList)

instance JSON YamlLight where
  showJSON yml =
    case yml of
      YStr bs -> JSString $ toJSString $ unpack bs
      YMap m -> JSObject $ toJSObject $
                map (\(y1, y2) -> (unpack $ fromJust $ unStr y1, showJSON y2)) $
                toList m
      YSeq ymls -> JSArray $ map showJSON ymls
      YNil -> JSNull

main = do
  d <- parseYamlFile "invoices.yaml"
  writeFile "address.json" $ encode $ fromJust $ lookupYL (YStr $ pack "bill-to") d

This works, and I'm sure there are other solutions. If I were cleverer, and knew Haskell better, I could perhaps write a cleverer, shorter solution, which would also be proportionately more difficult for someone else to understand, so I'm not particularly interested in making this code shorter, as it does the job.

But this illustrates why some people like dynamically typed languages. The fact that you can implement a variant data type in Haskell (such as YamlLight or JSValue) doesn't mean much, because these data types are not used everywhere, and therefore you have multiple competing ones that you've got to convert between. If you did have a single variant datatype that was used everywhere... you'd have a dynamically typed language, in effect.

The strictness of the type system gave rise to a choice of libraries and APIs that made my life harder, not easier. I then had to write glue code to marshall between the dynamic types used by the two libraries I needed. [Edit: or, as it turned out, I need to know where to find it, possibly in the form of already written type class instances, or how to get the compiler to write it for me]

Some people might still prefer the Haskell version. It has some nice properties, like the fact that compiler has checked that it can indeed convert any YAML object into JSON — you'd get a warning if you missed a case. One response to that might be that if the two types didn't happen to match so well — for instance if the YAML library started supporting date/time objects — this benefit would disappear. If you need to avoid all possible problems up front, Haskell will help you out more. Python, on the other hand, will allow you to avoid spending time thinking about theoretical problems which may never happen in reality.

But there are always runtime errors that you could come across, even in Haskell — for example, if you want to convert this to cope with non-ASCII documents, the compiler can't point out all the places you need to fix, and if you forget one you could still get a runtime exception, or worse, silent data corruption.

So, in my opinion, this is a case where dynamic typing shines, and the ability to implement dynamic typing on top of static typing simply doesn't give you the benefits you get in a language that embraces dynamic typing to its core.

There are, incidentally, some interesting developments in Haskell that might allow the possibility of running programs that aren't quite typed correctly, as long as you don't encounter the type errors in practice. This could counter some of the points I've raised — see this interview with Simon Peyton-Jones , from 27:45 onwards.

Comments §

blog comments powered by Disqus