Forest path

This is a path through a dimly lit forest, curving from south to east. A large tree with low branches stands by the edge of the path. On the ground is a pile of leaves.

> count the leaves

69,105.

> climb the tree


Code for this episode can be found at

https://github.com/ericlippert/flathead/tree/blog1

Instructions for installing OCaml can be found in the readme, or in the first episode in this series.

Last time we explored the question “how do I get some number of bits out of a word, starting at a particular bit?” The code we came up with was:

let fetch_bits high length word =
  let mask = lnot (-1 lsl length) in
  (word lsr (high - length + 1)) land mask

I wrote this helper method and a few others like it and everything was great, right up until I spent I kid you not a solid hour trying to figure out why a computation was wrong:

blah blah blah (fetch_bits branch 5 6) blah blah blah

Now I’ve made it pretty clear where the mistake is here, but I assure you it was by no means clear what was wrong when the program started giving me completely crazy results.

The problem of course is that we have a function that takes three integers and I passed them in the wrong order. This gets 5 bits of the number 6, starting at bit “branch”, when I intended to say 6 bits of number “branch” starting at bit 5.

This is of course an easy bug in most languages. In C#, if I had a similar helper method, I could easily have written FetchBits(b, 5, 6) just as easily, and been just as wrong.

There are a number of things I could have done to prevent the bug. OCaml, like C#, allows arguments to be annotated with the name of the corresponding parameter, which certainly would have been a big help. But I am not in the habit of using this feature, and I was thinking: surely in a language with excellent type inference, there ought to be a way to have the compiler itself detect the error.

Writing this bug really got me thinking about how powerful a tool the type checker is in a statically typed language, and what limited use we make of it. A statically typed language is a tool for producing a partial proof that the program is correct, or at least not broken in some obvious, detectable way.

I also started thinking about how many different things the Z-machine uses integers for. We’re only a couple episodes in and we’ve seen bit numbers, bits and words. The Z-machine uses integers to identify addresses, variables, branch modes, opcodes, characters, screen widths, and dozens upon dozens more things, none of which make any sense to be assigned to each other. So:

Principle #3: Use “naked” primitive types like integers and bools as little as possible. Use the type system to both express meaning and provide evidence of correctness.

As we’ll see, by using the type system wisely we can also make the code easier to understand and to emphasize its meaning, rather than its mechanisms.

A nice idea. How do we do it?

OCaml both allows and encourages creation of super-lightweight “wrapper” types around any other type; I think of them as “tagging” the underlying type with a meaning, much the same way that an enumerated type in C# is essentially just a meaningful wrapper around an integral type. (They are not exactly enumerated types; later on we’ll use a similar technique to make enumerated types in OCaml.) The type declaration could not be simpler:

type bit_number = Bit_number of int
type bit_size = Bit_size of int

The left of the equality gives a name to the type. The right of the equality gives a constructor: to construct an instance of bit_number, we say what constructor we want to use: Bit_number. Let’s try it out:

let b = fetch_bits (Bit_number 5) (Bit_size 6) 0xBEEF

Hmm. That gives us a type checking error:

Error: This expression has type bit_number but an expression 
was expected of type int

We will need to do bit arithmetic on the wrapped number, which means that there needs to be a way to unwrap it. Let’s rewrite fetch_bits as follows:

let fetch_bits (Bit_number high) (Bit_size length) word =
  let mask = lnot (-1 lsl length) in
  (word lsr (high - length + 1)) land mask

And now our call succeeds. OCaml’s type checker says “I need something in the first argument that matches the pattern an integer wrapped in a bit number, and I wish to name that integer high“.

Now, on the one hand we now have solved the problem; an attempt to pass a naked integer for either the bit number or the bit size will fail, and similarly so will any attempt to swap the number and the size. The type system now guarantees that I cannot write my bug! But at what cost?

let result = fetch_bits (Bit_number 5) (Bit_size 6) word

Ugly! But hey, if I am going to be passing constants to those things then why don’t I just define some constants?

let bit0 = Bit_number 0
let bit1 = Bit_number 1
...
let size1 = Bit_size 1
let size2 = Bit_size 2
...

I can easily define all the constants I need. And now my code looks like this:

let result = fetch_bits bit5 size6 word

The code is not only completely type-safe, preventing me from making stupid transposition bugs, but it is also far more self-documenting than any of my earlier attempts. The code does not prevent stupid bugs like asking for ten bits starting at bit two, but I think I am unlikely to write that bug now; if I felt that I was then I could put additional checks into the helper method.

If you take a look at the code for this episode on github you’ll notice that I’ve put the type declarations and utility functions and main method all in separate files. Obviously that is a bit silly for such a small example, but we’ll be adding a lot more code soon and it will be convenient to have logical areas separated into different modules. More on that later.

All right, that’s enough on bit twiddling. Next time in this series: we’ll take a look at the “story file”, which is the heart of the Z-machine.

22 thoughts on “Forest path

  1. I wish there was a simple way to do this in C#. I’ve created wrapper structs around primitive types several times in the past to avoid this kind of bug and to leverage the type inference benefits. But it is quite a bit of boiler-plate code to express something simple. And in most cases I’d prefer that once compiled, it performs & behaves identically to the underlying primitive type.

    • I cannot disagree more. The number of things to implement for a simple wrapper is quite a mouthful to say the least. There’s the overrides on System.Object, then interfaces like IComparable and IEquatable and IFormattable, then there’s the operator overloads, etc. A simple language construct to wrap an existing type would be very useful indeed.

  2. using separate source files in VS2013, took me a while to realize that F# needs the correct compilation order (no forward reference allowed in other words). Right-click files and choose move up or move down to get them into the right order. (using “module X” and “open X” lines to declare and reference modules, respectively.)

  3. dmo has already beaten me to it, so let me instead to support the point made.

    In a recently previous life, I wrote financial software and we had templates to lightly wrap decimal so that we could manage Notional Value, Market Value, PreHaircut, PostHaircut, and so on with type-safe conversions.

    It would have been so nice to have a succinct way to tell the compiler “The new type NotionalValue should be treated exactly like a decimal except that it should not be implicitly convertible to any other type”

    Presumably simply and non-breaking?

    • Should add that if the value were (optionally) convertible from the underlying type to the new type, then it would still permit something like

      NotionalValue nv = 1234m;

      without having to add the explicit conversion from decimal.

  4. Bit-twiddling in a pure functional style. Of course.

    I eagerly await your next series at the other extreme, an automatic inference engine in 6502 assembler.

  5. Couldn’t you just pass high and length as a 2-uple instead?
    fetch_bits (5,6) word
    I think it would work well to prevent the specific mistake of getting the order wrong.

    Of course, your typeful approach has the additional advantage of avoiding accidental manipulation of bit numbers and sizes as integers.

  6. I see the mysterious plan. By making an unusual choice of language, you’ve inspired your readers to translate it into a bunch of others. There’ll be a Flathead for every occasion!

  7. Pingback: North of house | Fabulous adventures in coding

  8. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #2026

  9. Well, this being mainly a c# blog, I’ve decided to start a small project implementing the FlatHead ZMachine in…..yup thats right, c#! Surprise, surprise. Probably not half as interesting as other translations already up and running (Haskell, F#, etc.) but I figured out it would be a fun excercise. If anyone is remotely interested you can find it at CSharpZMachine.
    I wont have time to blog about the experience but I will update the project’s readme file explaining what I’m doing as I go along.

  10. Pingback: Up a tree | Fabulous adventures in coding

  11. Explicit types for adjacent parameters is exactly the sort of thing I would have done 3 years ago, but today feels a bit heavy to me. Perhaps my coworkers beat me out of pulling this sort trick in more verbose languages and I’m improperly transferring those lessons back.

    Some alternatives:

    * In a curly brace language, use builders to name parameters

    `bits.skip(5).take(6)`

    * In currying languages, use some infix operator

    bits |> drop 5 |> take 6

    * Demand that all bit twiddling be wrapped in semantically named functions which you unit test thoroughly

    let opcode_number instruction = fetch_bits 12 4
    assert_equal 0xF (opcode_number 0xBEEF)

    (Relying on unit tests is probably what I’d do. Only constants will be fed into fetch_bits. If the caller of fetch_bits made an error, that error will be a mistranscribed value. Miscomposed functions will not be the problem; type checking won’t save you. The *shape* of your call graph will be correct; only the *value* of the leaves will be incorrect. You *need* to use unit tests to catch this sort of bug in most cases.)

    * Use language-specific interval idioms that you’ve read so often that they’re burnt into your brain

    python_list[-4:-2]

    * Write a program to interpret types of opcodes from some sort of opcode spec. Question begging, but thinking about how I would construct an interpreter might yield a 1 in 4 shot of coming up with a really elegant solution (but a 1 in 8 chance of falling prey to a useless time sink).

    I’d be more amenable to the explicit type creation technique you used here if the constants “bit1” and “size5” had more semantic names. But my guess is that either (1) names wouldn’t map nicely to the standard language, or (2) the concepts mentioned in the standard, like “opcode_header_length”, would be used as both sizes and offsets.

    IDPIMLPAHFAHISY (I don’t program in ML professionally and haven’t for a hobby in several years).

  12. Pingback: Zork vs. Haskell, part 2. | Squandered Genius

  13. Pingback: Tools, Fools, and Types – willmurphyscode

Leave a comment