Behind house

You are behind the white house. Paths lead into the forest to the east and northeast. In one corner of the house is a small window which is slightly ajar.

> examine the window

The window is slightly ajar, but not enough to allow entry.

> open the window

With great effort, you open the window enough to allow entry.

> enter the house

Story state is, as we described last time, a big block of memory, typically more than would fit into an 8 bit machine like the Commodore 64. Remember, you’ve got to have room for the interpreter implementation in memory as well! Story state size is also typically more than can be addressed with a 16 bit pointer. In addition, when we save a game we actually are going to serialize out all the changes that have been made to memory so far, so it makes sense to keep all the mutable memory small and close together.

For these reasons memory is divided up into a “dynamic” region and a “static” region. The dynamic region is at the low addresses, the static region is at the top. Everything in the dynamic region can be accessed with a 16 bit pointer, but it is possible that some portions of static memory will be at addresses larger than 0xFFFF. The convention for this “high” memory is to use 17 (or, in later versions, more) bit pointers where the bottom bit is zero. Obviously that thing can fit into 16 bits. The price you pay is of course that everything in “high” memory must begin on an even boundary for it to be addressible by only 16 bits, and that you have to remember to do a computation on the 16 bit “packed” pointer before you use it as an actual offset into memory.

The interpreter on a physical-memory-constrained machine would typically leave dynamic “low” memory in physical memory and then page the “high” static code and strings in and out of physical memory as needed. Of course we are many decades from needing to implement such a paging system at the user level; the operating system takes care of all that for us. So we’ll just keep everything in memory all the time.

That said, it is still convenient to keep a strong distinction between dynamic and static memory; first, because I want to ensure that static memory is never written to; that would be a bug in either the game or the interpreter. And second, because for saving, restoring and restarting games we need to be able to say “give me the original state of dynamic memory”.

So that’s the first constraint on our new memory manager. The second is that the Z-machine traffics primarily in 16-bit integers called “words”. We’ll need a way to easily read and write words. Again, of course, writing a word means “make a new version of the story which has the change”.

Let’s put it all together and start writing some code. Code for this episode can be found at https://github.com/ericlippert/flathead/tree/blog3

To begin with, I want to make sure that I never conflate the address of a word with the address of a byte unless I do so deliberately. Once again the type system helps us by allowing us to tag an integer:

type word_address = Word_address of int

I also need a bunch of helper methods. Remember, there is no job too small to put into a helper method. Small helper methods are the methods that are most easily seen to be obviously correct, and therefore are a solid foundation upon which to build more complex code.

I will need to be able to increment and decrement byte pointers:

let inc_byte_addr_by (Byte_address address) offset =
  Byte_address (address + offset)
  
let dec_byte_addr_by address offset =
  inc_byte_addr_by address (0 - offset)

I am not normally a big fan of abbrvs in method names but these seem perfectly clear so I’m going to make an exception.

I am going to need to be able to say “given a byte address and a string, what is the byte at that offset in the string?”

let dereference_string address bytes =
  if is_out_of_range address (String.length bytes) then
    failwith "address out of range"
  else
    let (Byte_address addr) = address in
    int_of_char bytes.[addr]

I need to be able to say “given the address of a word, what are the addresses of the low and high bytes?” Note that in the Z-machine, words are always stored with the high byte at the lower address:

let address_of_high_byte (Word_address address) =
  Byte_address address
  
let address_of_low_byte (Word_address address) = 
  Byte_address (address + 1)

Again, I hope you appreciate the obvious correctness of these functions, and note how they maintain type safety. I cannot possibly mix up word addresses and byte addresses by accident. If you allow a developer to write a bug, they will! I am no exception to that rule.

That’s enough helper methods for now. I want my new abstract data type to support:

Reading and writing bytes and words from dynamic memory
Reading bytes and words from static memory

And later on we will need the feature of “give me the original bytes of dynamic memory”, but I’m not going to discuss that now.

I’ll put this in a new module called story.ml:

open Utility
open Type

type t =
{
  dynamic_memory : Immutable_bytes.t;
  static_memory : string;
}

Dynamic memory can change, so we’ll use our immutable byte-array change tracking system. Static memory cannot change so we’ll just use an immutable string, since in OCaml, chars are bytes.

We need a factory. I’ll have it take two strings, rather than a string and an immutable byte array:

let make dynamic static =
{
    dynamic_memory = Immutable_bytes.make dynamic;
    static_memory = static;
}

Our byte reader is perfectly straightforward: it just checks to see whether it should be reading out of dynamic or static memory, and then does so.

Again, notice how nothing here is explicitly typed. All the ugly explicit typing I hid away in one-liner helper functions!

let read_byte story address =
  let dynamic_size = Immutable_bytes.size story.dynamic_memory in
  if is_in_range address dynamic_size then
    Immutable_bytes.read_byte story.dynamic_memory address
  else
    let static_addr = dec_byte_addr_by address dynamic_size in
    dereference_string static_addr story.static_memory

The word reader is just two reads of bytes:

      
let read_word story address =
  let high = read_byte story (address_of_high_byte address) in
  let low = read_byte story (address_of_low_byte address) in
  256 * high + low

And writing is trivial. The range checking is done by the lower-level methods.

let write_byte story address value =
  let dynamic_memory = Immutable_bytes.write_byte story.dynamic_memory address value in
  { story with dynamic_memory }

let write_word story address value =
  let high = (value lsr 8) land 0xFF in
  let low = value land 0xFF in
  let story = write_byte story (address_of_high_byte address) high in
  write_byte story (address_of_low_byte address) low

Did you notice something about that last function? It looks like I mutated the variable story even though I said that variables are immutable in OCaml. What is up with that?

We started with a parameter story which was in scope throughout the method. I then shadowed it with another variable also called story. In OCaml, unlike C#, it is both legal and common to have one local shadow another. They need not even be of the same type! We will make use of this later.

So: I took the original story as a parameter, I made a mutated version of it by writing a byte and assigned that to another variable called story. Then I took that thing as an argument to the second call to write_byte and returned a changed version of it as the result of the function. The net result is a story with two changes.

All this memory management is well and good but at some point we’re actually going to have to get a story file off disk and into memory.

Next time on FAIC: we’ll do just that.

12 thoughts on “Behind house”

taradinoc on February 10, 2016 at 9:40 am said:

I imagine you’ll get into address packing and other version-specific details soon, so this might come in handy if you haven’t come across it already: all the version differences gathered from across the spec and summarized.

http://hansprestige.com/inform/zmachine_versions.html

Reply ↓
- ericlippert on February 10, 2016 at 3:27 pm said:
  
  That’s handy, thanks!
  
  Reply ↓
Pingback: Up a tree | Fabulous adventures in coding
Michael on February 10, 2016 at 4:39 pm said:

I find that implicit typing can be confusing. For example, as a reader of the code, how can I quickly tell whether the address argument of read_byte is a Byte_address or a Word_address?

Reply ↓
- Yawar on November 18, 2016 at 8:39 pm said:
  
  In OCaml you can annotate values and functions either inline, just before their definitions, or even in separate (interface) files. For an example of the middle approach which makes it obvious what all the types are, see https://ericlippert.com/2016/02/10/behind-house/#comment-61738
  
  Reply ↓
Mats on February 11, 2016 at 1:05 am said:

Highly interesting! I built a Gameboy emulator in C++ a while ago just for the fun of it, and will probably do Z machine in a functional language later on.

A small question though. You have two function, Immutable_bytes.read_byte and dereference_string which have different signatures, even though they are similar (take a state and an adress and return the value).

Immutable_bytes.read_byte story.dynamic_memory address
dereference_string static_addr story.static_memory

Was that intentional or just a mix up? I noticed that when I made my own emulator it helped to have consistent function signatures when writing my different read/write functions on the memory, bus and cartridge classes.

Reply ↓
- ericlippert on February 11, 2016 at 6:46 am said:
  
  That looks like a screw-up on my part! Nice find.
  
  Reply ↓
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #2029
Thomas ten Cate on February 11, 2016 at 1:28 am said:

Any plans to make the state into a monad? In Haskell this would be the natural approach; I’m not familiar with OCaml (so thanks for the tutorial!) but I’d expect it to be similar. Instead of explicitly shunting states around, in Haskell you can use the “do” notation syntactic sugar to write almost imperative-style code where it makes sense, without actually losing any of the pure functional aspects.

Reply ↓
- ericlippert on February 11, 2016 at 6:47 am said:
  
  I considered it, and then I decided to learn to crawl before I run.
  
  Reply ↓
Pingback: Kitchen | Fabulous adventures in coding
wtetzner on October 7, 2016 at 11:08 am said:

Another way to wrap types is to use the module system, which allows function names to be shorter, since they’re namespaced by the module name.

For example, for your Byte_address type, you could do something similar to this:

module Byte_address : sig type t val inc : t -> int -> t val dec : t -> int -> t val of_int : int -> t end = struct type t = int let inc address offset = address + offset let dec address offset = address - offset let of_int address = address end

This also gives you more control over which functions can actually extract the int from the type. If you just use a label, then you can pattern match to extract the int, but with a module using abstract types, only the functions inside of the module can freely access the actual type.

Reply ↓