Up a tree

Posted on February 9, 2016 by ericlippert

You are ten feet above the ground, nestled among large branches. Beside you on the branch is a small bird’s nest. In the bird’s nest is a large egg encrusted with precious jewels, apparently scavenged by a childless songbird.

> take the egg

Taken.

> go down
> go south
> go east

Source code for this episode can be found here: https://github.com/ericlippert/flathead/tree/blog2

All right, enough bit twiddling and philosophizing. Let’s start talking about the actual Z-machine architecture here. It is a very straightforward little virtual machine. We begin by dividing the state of the game up into two distinct buckets.

In the first bucket is a large array of bytes that is logically the “memory” of the virtual machine. This bucket contains all the code, all the strings, and data structures representing every object in the game. It also includes all the global variables. This first bucket we will call the “story state”, for reasons which will become apparent.

In the second bucket we have local variables, the call stack, and temporary storage. We’ll call this bucket the “interpreter state”.

An interpreter implementation takes as its input a file which is simply a straight up serialization of the story state. We’ll spend a lot of time in this series talking about how to decode that thing, but first, I need to go back to my first principles here. Remember that I said I was going to build this thing in a purely functional style: no side effects. How can that possibly work? The story contains all kinds of state which is constantly changing: the values of global variables, the relationships between game objects, and so on.

Well, remember how this works for immutable lists and stacks and dictionaries and whatnot. When you apply a change to an object, you don’t change the object. You make a new object that looks just like the old one, except slightly different. Isn’t that inefficient? No! Since the old object doesn’t change, you can reuse as much of its state as possible in the new object!

But how on earth am I going to make an immutable array of bytes that emulates a mutable array of bytes? Well, it turns out that in the Z-machine, almost all of the memory stays the same all the time. All the code, all the strings, they don’t ever change. In Zork, for example, there are only a few thousand bytes that change through the course of the game, and very few of them change on any given instruction.

Keeping track of a few thousand byte changes is nothing. We can easily do that. We’ll just keep a copy of the original state, and an immutable lookup table that describes every byte that has changed.

So let’s start with that. I want a data type which represents an immutable array of bytes where a write causes a new object to be created that contains the changed state. I want the interface to this data type to be:

reading takes a byte array and an address and produces a byte
writing takes a byte array, an address, and a byte, and produces a new byte array

What is an address? An integer obviously… NO WAIT! We decided last time that naked integers are to be avoided in a world where it is so cheap and convenient to tag them with their semantics. So let’s start by defining a type:

type byte_address = Byte_address of int

I feel better already. We’re also going to need an immutable map from integers to bytes. Fortunately one comes built-in:

module IntMap = Map.Make(struct type t = int let compare = compare end)

This one is a bit complicated. Both modules and types can be generic in OCaml, and this is the odd syntax for saying “I would like a module containing a generic dictionary from int to something to be specified later please.”

I need to know if an address is valid. Remember, there is nothing too small to be put into a helper function:

let is_in_range (Byte_address address) size = 
  0 <= address && address < size
    
let is_out_of_range address size =
  not (is_in_range address size)

Note that OCaml uses double-ampersand for Boolean-and, the usual less-than operators, but uses “not” rather than ! for Boolean negation.

And now our abstract data type. I’m going to put it in its own file, which logically defines a module; I’ll talk about what precisely a module is in more detail later, but you can think of it as being a bit like a class and a bit like a namespace. Let’s go through the immutable bytes module line by line:

open Type
open Utility

This is like a using namespace directive in C#; it says that members of these modules can be used without qualification.

type t =
{
  original_bytes : string;
  edits : char IntMap.t
}

We have here a new kind of user-defined type. Before we just made wrappers around ints. This is more like a struct; we have two fields, one of type string and one of type “map from integer to char”. Think of IntMap.t as being like Dictionary<int, V>, and char IntMap.t is like Dictionary<int, char>.

Yes, generic types are specified backwards in OCaml; I’m not sure why the type parameter comes before the generic type. This strikes me as odd and I don’t understand it yet.

The primary type in a module is traditionally called t for reasons having to do with generic modules that I’m not going to go into. Since inside the module it will almost never be referred to by name, and outside the module it will be Immutable_bytes.t, it doesn’t matter that this name is not descriptive in the least.

Notice that the fields in the record type are separated by semis, not terminated by semis.

I’m using a string as the backing store for my immutable array of bytes. Yes, apparently OCaml lives in the world of 8-bit chars and non-Unicode strings. Parts of this language definitely feel like they’re based on forty year old C libraries. (There is a mutable byte array type, but I’m pretty sure it’s just a thin wrapper around a string, and I don’t need mutability.)

Moving on. We know how to construct an instance of a wrapper type, but what about these record types? The traditional way to do it is to make a constructor function called make:

let make bytes =
  { original_bytes = bytes; edits = IntMap.empty }

To construct the actual instance we just put assignments to all the fields inside curly braces. We don’t have to say we are constructing a Immutable_bytes.t. OCaml deduces from the fact that all the fields are supplied what the type should be.

It will be handy to know what the largest possible address is:

let size bytes =
  String.length bytes.original_bytes

Note that fields of records are accessed with dots.

Is computing the length of a string O(1) as it is in C#, or O(n) in the size of the string as strlen is in C? I honestly don’t know, and the documentation I’ve looked at doesn’t seem to say. It would seem unfortunate to have to count the size of a large immutable string every time you access memory, but then again I haven’t seen any unacceptable performance problem yet, so I’m not going to stress about it. If it turned out to be a problem I could easily cache the string length in another field in the record. It’s only a few more bytes.

What does the read function do? It takes an Immutable_bytes.t record and an address, verifies that the address is in range, and then produces the byte:

let read_byte bytes address =
  if is_out_of_range address (size bytes) then
    failwith "address is out of range"

Several interesting points here.

We are seeing conditional expressions for the first time. Though they have a syntax reminiscent of a conditional statement, they have the semantics of a conditional expression. (That is, a ? : expression in C#.)

The consequence produces a failure exception which does not affect type inference. Note that since there are no statements per se, producing a failure exception is permitted as an expression, unlike in C#.

Again, nothing is manifestly typed. Note that in particular there is no annotation on bytes:

  else
    let (Byte_address addr) = address in
    let c =
      if IntMap.mem addr bytes.edits then IntMap.find addr bytes.edits
      else bytes.original_bytes.[addr] in
    int_of_char c

Bizarrely, for a language that has “option” types (which are like nullable value types in C#; either “some value” or “none”) the map type requires you to check for membership (“mem“) before trying to look up an item. I would have expected a method that returned the value if it existed and the “none” option otherwise. More on option types in a later episode!

Anyways, we check to see if we have an edit in the map of edits. If not then we go to the original string.

The indexing operator for strings is .[ ] which I think is charming.

The writing code takes a bytes record, an address and the new value, which is deduced to be an int.

We could check to see if the value being “changed” already has the given value, and if so, do nothing — simply return bytes. But setting a thing to itself seems unlikely, and so this is probably not a big win.

let write_byte bytes address value =
  if is_out_of_range address (size bytes) then
    failwith "address is out of range"
  else
    let (Byte_address addr) = address in
    let b = char_of_int value in
    { bytes with edits = IntMap.add addr b bytes.edits }

I find it slightly odd that the arguments to add are key, value, map, and not map, key, value, but whatever.

OCaml has no implicit conversions; the char_of_int method is necessary to convert an int to a char.

Again, remember, everything here is immutable. To construct a new record we could write another make-like function, but we have a nice syntactic sugar here for “I want a record that has all the fields the same as bytes, except I want field edits replaced with the value of that name given here”. We could also have said

    let new_edits = IntMap.add address b bytes.edits in
    { bytes with edits = new_edits }

Or, even nicer, I could have used this syntactic sugar:

    let edits = IntMap.add address b bytes.edits in
    { bytes with edits }

OCaml figures that I want to make a copy of the record with field “edits” replaced by the value of local “edits”.

All right, we have a very simple abstract data type. Let’s take it out for a spin:

let () = 
    let addr1 = Byte_address 1 in
    let bytes_a = Immutable_bytes.make "Hello world" in
    let bytes_b = Immutable_bytes.write_byte bytes_a addr1 65 in
    let b_a = Immutable_bytes.read_byte bytes_a addr1 in
    let b_b = Immutable_bytes.read_byte bytes_b addr1 in
    Printf.printf "%d %d\n" b_a b_b

And sure enough, the “mutated” version is different but the original version retains it’s state.

Next time on FAIC: We’ll build a slightly more complicated memory model on top of this foundation.

44 thoughts on “Up a tree”

Stuart on February 9, 2016 at 7:33 am said:

Last installment you said that nothing was too small to make a function of, so surely…

let try_find key map = if Map.mem key map then Some (Map.find key map) else None

(I’m making some guesses about syntax here, I only learned ML briefly twenty years ago and never the Ocaml variant)

Reply ↓
- ericlippert on February 9, 2016 at 9:28 am said:
  
  Good idea! I often do a similar helper for try-parse in C#.
  
  Reply ↓
Svick on February 9, 2016 at 7:39 am said:

Why does read_byte return and write_byte accept an integer instead of a char? What is the purpose of those conversions?

Reply ↓
- Stuart on February 9, 2016 at 7:46 am said:
  
  In fact, following the principles you’ve laid out in these posts, I’m surprised not to see a wrapper type for the byte values themselves, a la “type memory_value = Memory_value of char”…
  
  Reply ↓
  - ericlippert on February 9, 2016 at 9:30 am said:
    
    I’ve considered it; I might try it and see if it makes the code easier or harder to understand.
    
    Reply ↓
    - Jeff Nyman on August 11, 2019 at 4:21 am said:
      
      I have been curious about this. It seems the code rarely sticks to the principles articulated. Even when others bring up how those principles could apply, the code is rarely updated with those ideas, even when they’re stated to be good ideas.
      
      So I guess the question is: what’s the purpose of having the principles of having the principles in the first place? It’s been very hard to reason about what is and is not good OCaml (or just functional in nature) due to this, at least by going through this series.
      
      Reply ↓
Svick on February 9, 2016 at 7:49 am said:

I find it slightly odd that the arguments to add are key, value, map, and not map, key, value, but whatever.

In F#, the arguments are like this so that you can use the pipe operator to chain calls, the effect is similar to using fluent extension methods in C# (e.g. LINQ). This means that you can write code like this: IntMap.empty |> IntMap.add 1 42 |> IntMap.add 2 43.

I assumed that F# just inherited this from OCaml, but apparently OCaml got the pipe operator only in 2013, long after F# was first released. So now I have no idea.

Reply ↓
- alexstrada on February 9, 2016 at 8:40 am said:
  
  > I assumed that F# just inherited this from OCaml, but apparently OCaml got the pipe operator only in 2013, long after F# was first released. So now I have no idea.
  
  True. But you don’t need it to be defined in the language. You could just define it yourself (in any version of OCaml) like this:
  
  let (|>) x f = f x
  
  Reply ↓
  - Svick on February 9, 2016 at 8:44 am said:
    
    Right, but you wouldn’t change the order of arguments on a lot of functions just because of an operator that the user could define.
    
    sepp2k’s comment below that claims it’s about partial application makes sense to me.
    
    Reply ↓
    - Stuart on February 9, 2016 at 8:47 am said:
      
      They’re related: the operator can be defined *because* partial application works the way it does.
      
      Reply ↓
    - alexstrada on February 9, 2016 at 8:53 am said:
      
      Sure, partial application is the real reason, and it’s like that because IntMap is a functional data structure. Imperative data structures like hash tables (defined in Hashtbl module) don’t follow that pattern. Hashtbl.add is used like this: Hashtbl.add tbl x y (the hash table is the first argument).
      
      Reply ↓
      - EduardoS on February 10, 2016 at 4:22 am said:
        
        I am not “fluent” in OCaml, but, if I understood correctly, without the pipe operator, to add multiple members what you need to do is:
        IntMap.add 1 42 IntMap.add 2 43 IntMap.empty
        Or I am missing something like Haskell “$” operator?
        And, if the map were the first argument the code would be:
        IntMap.add IntMap.add IntMap.empty 1 42 2 43
        Which is not very clear.
        
        BTW, and another question from some experience with Haskell, isn’t there a lot of “let” in that code?
      - astrada on February 10, 2016 at 6:44 am said:
        
        Or I am missing something like Haskell “$” operator?
        
        There is no standard “$” operator in OCaml. You could define it, but it’s tricky to get the right operator priority. In your example, you need parenthesis:
        
        (IntMap.add 1 42 (IntMap.add 2 43 IntMap.empty))
        
        BTW, and another question from some experience with Haskell, isn’t there a lot of “let” in that code?
        
        In OCaml “let” cannot be omitted defining a value.
- panesofglass on February 9, 2016 at 8:48 am said:
  
  The collection instance is usually last so that you can use partial application. For example, let’s say you want to make a function that always inserts a value `v` into an map `m` with a constant key `”special-key”`. You could do something like `let set_special v m = Map.add “special-key” v m` or even just `let set_special = Map.add “special-key”`. The general idea is that you may want to partially apply, compose, etc. many functions before you have an instance against which to apply the function. If you took the instance first, you would have to write lambdas for each of these applications.
  
  Reply ↓
grico1974 on February 9, 2016 at 8:20 am said:

*OCaml has no implicit conversions; the char_of_int method is necessary to convert an int to a char.*. Isn’t that conversion always *explicit*?

Reply ↓
- sepp2k on February 9, 2016 at 8:31 am said:
  
  No, in some languages, including C, C++ and C#, the conversion is implicit, so for example “int x = ‘a’;” is legal without any explicit conversion.
  
  Reply ↓
  - grico1974 on February 9, 2016 at 8:38 am said:
    
    But that would be int_of_char wouldnt it? char_of_int is the opposite cast.
    
    Reply ↓
    - sepp2k on February 9, 2016 at 8:42 am said:
      
      Oops, good point. The conversion from int to char is also implicit in C¹ and C++, but not C#.
      
      ¹ In fact in C, but not C++, the type of a character literal is actually int, not char, so without that implicit conversion “char c = ‘c’;” would not be legal.
      
      Reply ↓
sepp2k on February 9, 2016 at 8:29 am said:

Yes, generic types are specified backwards in OCaml; I’m not sure why the type parameter comes before the generic type. This strikes me as odd and I don’t understand it yet.

The idea is that, for example, “string list” sounds more natural when said out loud than “list string”. Like “X is a string list” is a proper English sentence whereas “X is a list string” is not unless you add an “of”.

I find it slightly odd that the arguments to add are key, value, map, and not map, key, value, but whatever.

The convention in ML-like languages is that the object that you’re operating on is the last argument to the function. This often aids with partial application, allowing you to write “List.map (Map.add key value) my_list_of_maps” to add a given key-value pair to a list of maps without having to use an explicit lambda.

PS: Looking at the run.bat in your git repository, I’d suggest that you look into ocamlbuild. With it you can just write “ocamlbuild flathead.native” to compile flathead.ml and its dependencies into an executable without having to manually specify which files flathead.ml depends on or in what order they need to be compiled. It figures this out automatically. It also only re-builds files that have changed (like any proper build system).

Reply ↓
- ericlippert on February 9, 2016 at 9:32 am said:
  
  I’ll look into ocamlbuild, thanks!
  
  Reply ↓
James Hart on February 9, 2016 at 8:54 am said:

“Is computing the length of a string O(1) as it is in C#, or O(n) in the size of the string as strlen is in C? I honestly don’t know”

Isn’t that significant in terms of, for example, whether 0x00 is a valid character within an OCaml string? Given that you’re using this to store arbitrary chunks of binary data, that seems like a detail you might need to worry about…

Reply ↓
- alexstrada on February 9, 2016 at 9:00 am said:
  
  According to the source code (https://github.com/ocaml/ocaml/blob/616e5c32f19dc225df716c31a1ce4e5edc1ca3a6/byterun/str.c), it looks like String.length is O(1). It’s implemented in C function caml_string_length.
  
  Reply ↓
Pingback: Forest path | Fabulous adventures in coding
Kevin Jones (@vcsjones) on February 9, 2016 at 10:10 am said:

It does work fine on OS X. You can install the ocaml compiler with Homebrew, “brew install ocaml” and the compile and run steps are exactly the same. If anyone is interested in running it on OS X, here is a fork of Eric’s repository with a shell script for running it on OS X. I suspect it will work on *nix systems as well assuming the compiler is installed: https://github.com/vcsjones/flathead/tree/blog2

Reply ↓
- ericlippert on February 9, 2016 at 10:22 pm said:
  
  Thanks for the pull request. I’m heads-down in new employee orientation this week but I will try it out on my brand new mac ASAP!
  
  Reply ↓
Adrian on February 9, 2016 at 11:00 am said:

“It would seem unfortunate to have to count the size of a large immutable string every time you access memory”

I’m new to functional languages, but this would surprise me. I thought one of the major advantages of immutability was that functions can memoize results. I’d expect the first invocation to count and the later ones to returns the memoized value.

Reply ↓
- astrada on February 9, 2016 at 12:52 pm said:
  
  Yes, but OCaml, unlike Haskell, is not purely functional. It allows functions with side effects, so memoization is never implicit. You have to explicitly use module Lazy (or something similar).
  
  Reply ↓
El Q (@fquednau) on February 9, 2016 at 12:36 pm said:

Still bumping along nicely with F#, first post with a few changes. In F# we’ve got the array, which already comes with an efficient int-based index. I assume such a type does not exist in OCaml?

Reply ↓
- ericlippert on February 9, 2016 at 10:16 pm said:
  
  There are mutable array types in OCaml but I did not see an immutable one. Likely there is one and I just haven’t found it yet.
  
  Reply ↓
  - Chris Venus on February 10, 2016 at 8:20 am said:
    
    Is mutability/immutability why you used a string as your backing store? I don’t understand the language (but its fun learning about it! Yay adventures!) but an array of bytes or integers would seem like the obvious thing to have and would mean less having to wrangle integers to characters and vice versa. But there is no built in immutable integer array so you are (ab)using a string which has the properties you want and is immutable?
    
    Reply ↓
    - ericlippert on February 10, 2016 at 3:24 pm said:
      
      That’s correct. I’m learning as I go here! There is probably a better way.
      
      Reply ↓
      - Gabe on February 11, 2016 at 9:28 am said:
        
        If strings are immutable, that’s a relatively recent breaking change. Don’t be surprised to find string mutation in legacy code.
      - ericlippert on February 11, 2016 at 9:11 pm said:
        
        I think strings were made immutable between 4.01 and 4.02, which seems like an odd time to make a serious breaking change. I’d think serious breaking changes would happen on major revisions. Perhaps there was a plan to get it into 4.0, and they missed a deadline or something? I don’t know.
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #2028
Pingback: Behind house | Fabulous adventures in coding
talpr on February 11, 2016 at 3:26 am said:

So this is a little nit-picky, but if there are types for address, bit number, and bit size, shouldn’t there also be a type for the memory/string sizes?

Reply ↓
- Chris Venus on February 11, 2016 at 3:36 am said:
  
  Also a wrapper type for strings to make it clear they are immutable byte arrays and not really strings!
  
  Reply ↓
- ericlippert on February 11, 2016 at 6:49 am said:
  
  That is a good point. I was somewhat surprised to see how much OCaml uses unadorned integers for in its own standard libraries. I have not applied my discipline consistently throughout this project! This is probably a good idea.
  
  Reply ↓
  - astrada on February 15, 2016 at 3:03 pm said:
    
    I was somewhat surprised to see how much OCaml uses unadorned integers for in its own standard libraries.
    
    The main reason is performance. The standard library provides a very thin but efficient abstraction layer. So that’s why it’s a little rough around the edges, and looks more like a C library than a Haskell library.
    
    Reply ↓
Eldc on February 14, 2016 at 12:54 am said:

> Note that since there are no statements per se, producing a failure exception is permitted as an expression, unlike in C#.

In C# I use a “T Throw(this Exception e) { throw e; }” function when I want to throw from a conditional expression as with failwith.

Reply ↓
Jakub on February 15, 2016 at 1:09 am said:

Does anyone know, how to do “module IntMap = Map.Make(struct type t = int let compare = compare end)” in F#? Thanks

Reply ↓
- sepp2k on February 15, 2016 at 5:04 am said:
  
  F# doesn’t have higher-order modules, so `Map` is simply a generic type parameterized over both the key and value types. So instead of `foo IntMap.t`, you’d use `Map` and instead of `IntMap.some_map_function` you’d use `Map.someMapFunction` or `myMap.someMapMethod`.
  
  Reply ↓
Jeff Nyman on July 7, 2019 at 5:18 am said:

I’m way late to this. As a note, I found the syntactic sugar parts don’t work at all.

The code mentioned is this:

let new_edits = IntMap.add address b bytes.edits in
{ bytes with edits = new_edits }

Or this:

let edits = IntMap.add address b bytes.edits in
{ bytes with edits }

Those, however, fail to work. You will get “Expression has type Type.byte_address but an expression was expected of type Type.IntMap.key = true.”

Beyond that, it’s not clear why — if these truly are better abstractions — they weren’t used. They definitely do seem to remove the “mechanism” a bit more and reveal the intent, which was one of the principles that was started out with.

Reply ↓
- Jeff Nyman on July 7, 2019 at 5:25 am said:
  Ah, responding to my own comment, I see the issue. It’s not made very clear by the post itself, but the logic would look like this:
```
else
  let (Byte_address addr) = address in
  let b = char_of_int value in
  let edits = IntMap.add addr b bytes.edits in
  { bytes with edits }
```
  If you don’t know OCaml and you’re reading these posts, I’m finding it’s easy to mistake when you should replace whole bits of code with just updates to parts of the code.
  
  This is a great series but so far it’s showing me that OCaml is a very ugly language and while the *theory* of functional programming seems great, the *practice* of it can lead to very difficult to read code.
  Reply ↓