Living room

The living room opens to the east. To the west is a wooden door, above which is strange gothic lettering. The door is nailed shut. There is a trophy case here, and a large oriental rug in the center of the room. A battery-powered brass lantern is on the trophy case. Above the trophy case hangs an elvish sword of great antiquity.

> open the trophy case

Opened.

> put the egg in the trophy case

Done.

> read the strange lettering

The engravings translate to “This space intentionally left blank.”

> move the rug

You drag the rug to one side of the room, revealing a closed trap door.

> open the trap door

The door reluctantly opens to reveal a rickety staircase descending into darkness.

> take the lamp and the sword

brass lantern: Taken.
sword: Taken.

> go east
> go up

If you were to examine the story file with a binary reader you would not find very much ASCII text in it, which might be a little surprising, since these games are pretty heavy on the text resources. Since they are so heavy on text resources, text is compressed in two simple ways, in a format called the zstring format:

every two bytes of zstring data contains three 5-bit characters; the remaining bit signals whether there is more text to come (0) or we’re done (1).
the 96 most common text fragments are assigned “abbreviation numbers”; a special code is used to mean “insert the given fragment here” in the middle of a string.

Let’s start by looking for the list of the 96 abbreviations.

(Code for this episode can be found at https://github.com/ericlippert/flathead/tree/blog4)

The address of the abbreviations table is in a word starting at byte 24. The contents are a 16 bit pointer to memory. We can simply represent the abbreviations table base pointer as an integer NO WAIT WE SAID WE WOULDN’T DO THAT.

That was a close one.

This particular integer has very special meaning, and we’re going to write special code to deal with handling this kind of pointer and only this kind of pointer. So let’s right now make sure that we never use this thing as anything other than the base of the abbreviations table. Let’s make the compiler tell us if we ever screw up and try to use this thing as any other kind of integer.

Moreover: I just said that there were 96 possible abbreviations. Abbreviations will be referred to by a number between 0 and 95. Again, it seems that a likely bug is that we’re going to use one of those things where we should be using the address of a string, the packed address of a string, a character, the table base,… there are a dozen different bugs we could write if we keep abbreviation indices as plain numbers, so let’s not do that. It costs us almost nothing to prevent these bugs.

type abbreviation_number = Abbreviation of int
type abbrev_table_base = Abbreviation_table_base of int

And we can extract the table base from the story here:

let abbreviations_table_base story =
  let abbreviations_table_base_offset = Word_address 24 in
  Abbreviation_table_base (read_word story abbreviations_table_base_offset)

Again: I love me some two-line long methods. I very rarely manage to fit a bug into only two lines.

OK, now, how do we decode the abbreviation table? The abbreviation table is pretty much the simplest table in the whole Z-machine. It is 96 words long, starting from the base pointer. Each word contains a 17 bit pointer; the pointer has been divided by two in order to fit into a word. That pointer then points to a compressed string.

I just introduced a whole pile of new concepts there. Let’s make types to represent those concepts. We’ll call the compressed 17 bit pointers “word zstring addresses” because that’s what the Z-machine spec calls them. We’ll call the uncompressed 17 bit pointers “zstring addresses”:

type word_zstring_address = Word_zstring of int
type zstring_address = Zstring of int

OK, things are ticking along nicely here. Let’s write some code. Suppose we have a compressed zstring address in hand. How do we make an uncompressed zstring address?

let decode_word_address (Word_zstring word_address) =
  Zstring (word_address * 2)

Again: not only is this a one-line-long guaranteed-correct method, I now know that if I ever try to mix compressed and uncompressed pointers, the program will not even compile. Loving it.

I need a helper method that increments a word address by a given number of words.

 
let word_size = 2

let inc_word_addr_by (Word_address address) offset =
  Word_address (address + offset * word_size)
  
let inc_word_addr address =
  inc_word_addr_by address 1

And I need a way to change the abbreviation table base pointer into a word pointer, since the thing is a list of 96 words. Trivially done:

let first_abbrev_addr (Abbreviation_table_base base) =
  Word_address base

All right, we are now set to answer the question: given an abbreviation number from 0 to 95, what is the uncompressed pointer to a zstring associated with that number?

let abbreviation_zstring story (Abbreviation n) =
  if n < 0 || n >= abbreviation_table_length then
    failwith "bad offset into abbreviation table"
  else
    let base = first_abbrev_addr (Story.abbreviations_table_base story) in
    let abbr_addr = inc_word_addr_by base n in
    let word_addr = Word_zstring (Story.read_word story abbr_addr) in
    decode_word_address word_addr

And if we’ve done this right, what’s popped out the other end is an uncompressed 17 bit pointer to a block of string data.

Next time on FAIC: we’ll make a start at decoding that string data.

10 thoughts on “Living room”

Pingback: Kitchen | Fabulous adventures in coding
shotgunninja0 on February 15, 2016 at 7:30 am said:

My question is: Why go through all of this trouble, in the modern day? It’s true that the Z-machine is a work of art, clearly, and that in the heyday of text-based games, such pointer manipulation is possible. However, it’s 2016, and we have faster processors, more memory, and larger hard disks with better read/write times. Why are we still making things harder and introducing more low-level programming concepts for what is basically dialogue tree traversal, when we could be just pulling in strings as needed?

Reply ↓
- ericlippert on February 15, 2016 at 7:57 am said:
  
  More modern text adventurer interpreters like Glulx support unencoded Unicode strings. But ironically, the fact that there are so many Z-machine interpreters out there means that the Z-machine format is still an attractive target.
  
  But again, I’m doing this for fun and to learn OCaml, not because I want to have a modern interpreter that takes advantage of what modern machines have to offer.
  
  Reply ↓
Dan Olson on February 15, 2016 at 1:01 pm said:

You may be interested in my F# zmachine, although I kind of stopped just before implementing dictionary support so it’s a little underwhelming. I write in F#’s verbose mode so it’s rather similar to o’caml, and I tried to be idiomatic with it rather than just writing in procedural style like a lot of people seem to do with F#. This is the third zmachine I’ve written and I think it’s a great project for playing with unfamiliar languages. The code is probably a bit messy, but hopefully understandable. I was going for succinctness.

https://github.com/olson-dan/fszm/blob/master/fszm/Program.fs

Reply ↓
- ericlippert on February 15, 2016 at 5:29 pm said:
  
  I’ll check it out!
  
  Reply ↓
tejón on February 15, 2016 at 9:37 pm said:

Lagging a bit, but here’s some more of my Haskell follow-along:

http://squangen.com/?p=629

Reply ↓
Pingback: The Morning Brew - Chris Alcock » The Morning Brew – #2032
Pingback: Attic | Fabulous adventures in coding
pbradshaw on February 17, 2016 at 10:34 am said:

This “character compression’ scheme reminds me a little of SIXBIT encoding in the old Dec20 mainframe. That mainframe had 36 bit registers/”words”, so encoding ASCII into SIXBIT was useful for stuffing an entire word with 6 characters. It also allowed most of the printable characters you’d actually need (minus lower case). There was also a less-standard 7-bit ASCII encoding which gave you 5 characters per word, and the high order bit was used as the terminator as described above. This gave you lower-case characters (basically, the full ASCII standard set of characters up to 127).

Reply ↓
- marcbrooks on February 17, 2016 at 10:44 pm said:
  
  Or even packed ASCII (6 bits per character, packed two characters to one 12-bit memory word) on the DEC PDP-8’s like the first machine I coded on.
  
  Reply ↓