You are at the periphery of a large dome, which forms the ceiling of another room below. A wooden railing protects you from a precipitous drop.
> tie the rope to the railing
The rope drops over the side and comes within ten feet of the floor.
> climb down
OK, we’re getting good at this, so let’s go a little faster. I’m going to post the code at https://github.com/ericlippert/flathead/tree/blog6.
The most important thing to know about the dictionary is: to save space, only the first 6 (or 9, in some versions) characters of a word are actually meaningful to the game. Anything after that is truncated. It is perfectly legal when playing Mini-Zork to say “pick up the screwdribble” and the game will cheerfully pick up the screwdriver, because it stopped disambiguating words after “screwd”.
The dictionary of words recognized by the game is stored starting at the byte address found in the word beginning at offset 8 in the story. At that address, memory is organized as follows:
- One byte contains the number of “word separator” characters. If the game parser wants characters other than spaces to separate words, they are listed here.
- Then follows that many word separator characters.
- After that comes one byte that gives the number of bytes in each dictionary table entry. Since words are six characters, which is four bytes of zstring, a dictionary entry has to be at least that many bytes. But a dictionary entry is permitted to be larger; the game may store data of its choice – likely having to do with how sentences are parsed but to be honest I do not know what is stored here – after the four bytes of zstring text. Every entry is of the same size.
- After that comes two bytes that give the number of dictionary entries.
- And after that comes the actual entries: string data first, and then extra game data.
Parsing all this out is pretty straightforward given the tools we’ve built. I’ve added more wrapper types so that the compiler helps catch my mistakes:
type dictionary_base = Dictionary_base of int type dictionary_table_base = Dictionary_table_base of int type dictionary_address = Dictionary_address of int type dictionary_number = Dictionary of int
I didn’t bother to add special types for the tiny little word separator table. Later on we will see that the Z-machine needs to track the addresses of individual dictionary entries, but it is convenient for me when trying to understand the structure to think about indices into that table. Hence I’ve got both dictionary numbers and dictionary addresses. Again, I want to use these wrapper types so that the compiler tells me when I’ve made a mistake.
Getting the dictionary base address out of the story is straightforward:
let dictionary_base story = let dictionary_base_offset = Word_address 8 in Dictionary_base (read_word story dictionary_base_offset)
And I think I can let the decoding logic pass without further comment:
let word_separators_base (Dictionary_base base) = Byte_address base let word_separators_count story = let dict_base = Story.dictionary_base story in let ws_base = word_separators_base dict_base in Story.read_byte story ws_base let entry_base story = let dict_base = Story.dictionary_base story in let ws_count = word_separators_count story in let ws_base = word_separators_base dict_base in inc_byte_addr_by ws_base (ws_count + 1) let entry_length story = Story.read_byte story (entry_base story) let entry_count story = let (Byte_address addr) = inc_byte_addr (entry_base story) in Story.read_word story (Word_address addr) (* This is the address of the actual dictionary entries. *) let table_base story = let (Byte_address addr) = inc_byte_addr_by (entry_base story) 3 in Dictionary_table_base addr let entry_address story (Dictionary dictionary_number) = let (Dictionary_table_base base) = table_base story in let length = entry_length story in Dictionary_address (base + dictionary_number * length) let entry story dictionary_number = let (Dictionary_address addr) = entry_address story dictionary_number in Zstring.read story (Zstring addr)
All right, we can now ask for the number of dictionary entries and read out each string.
This is major boring code right here and I want to talk about something interesting, so let’s see how we might accumulate all of those strings to display them.
We’ve already seen how to run loops using tail-recursive auxilliary functions, but I want to be even a bit more functional than that. The defining characteristic of functional programming is that functions are first class, so let’s give an example.
In this series I’m going to very frequently want to turn a collection of numbered things — like dictionary words — into a string summarizing them. Rather than writing those little aux loops to accumulate a result, I want to write something a bit higher level. I want a helper that lets me express “turn each item from 0 to n-1 into a string and concatenate the results”. Here’s what I came up with:
let accumulate_strings_loop to_string start max = let rec aux acc i = if i >= max then acc else aux (acc ^ (to_string i)) (i + 1) in aux "" start
Let’s take a look at this carefully. We have three arguments. The start and max arguments are straightforward integers; we will loop starting at start, and stop when we get to max. But what is to_string? It’s a function that takes an integer and produces a string; in C# terms it would be a delegate.
We have our standard tail-recursive auxilliary loop that takes an accumulator and either returns the accumulator or does tail recursion. The recursion takes a new value for the accumulator, which is the result of calling
to_string concatenated to the previous accumulation.
As we discussed previously, I don’t really care that repeated string concatenation is n-squared here. The vast majority of times I use this I’ll be just dumping things for debugging purposes or building small strings. If I had a real performance problem then I’d use some kind of string builder, just as I would in C#.
Now that I have this little helper method I can use it to build me a list of all the dictionary words:
let display story = let count = entry_count story in let to_string i = Printf.sprintf "%s " (entry story (Dictionary i)) in accumulate_strings_loop to_string 0 count
My little to_string function takes an int and produces a string. Notice that of course the nested to_string function is closed over the parameters of its containing function; that is, it can freely use
story without taking it as a parameter.
And now if we run this thing…
let dict = Dictionary.display story in Printf.printf "%s\n" dict
… then we can see every word meaningful in Mini-Zork:
$ve . , #comm #rand #reco #unre " a about across activa advent again air air-p all altar an ancien and answer apply around art ask at attach attack awake away ax axe bag banish bare basket bat bathe beauti beetle behind bell below beneat birds bite black blade block bloody blow board boarde boards boat bodies body bolt bones ...
(The $ve, #rand, and so on words allow the developers to tweak aspects of game play for testing. Re-seeding the random number generator, for instance.)
Next time on FAIC: We’ll start decoding the object tree.
That is indeed what the extra data is used for. The format is defined by the game library, so games compiled by Inform use a different format from Infocom’s ZIL games, and the format even changed from one ZIL game to another as Infocom’s tools evolved.
For the curious, there’s a package called ztools that includes utilities for examining story files, including txd (for disassembling code) and infodump (for printing tables). If you run infodump -d -c 1 minizork.z3, you’ll get output like this:
[ 385] @ $32e1 s [13 1c 00] <dir>
[ 386] @ $32e8 sack [80 01 00] <noun>
[ 387] @ $32ef sailor [80 01 00] <noun>
[ 388] @ $32f6 sand [80 01 00] <noun>
[ 389] @ $32fd sandwi [80 01 00] <noun>
[ 390] @ $3304 sapphi [a0 01 b6] <adj> <noun>
The first byte in brackets is a set of flags classifying the word (“parts of speech”), and the remaining bytes are numeric identifiers for up to two parts of speech(*): “s” is direction 0x1c, “sapphi” is adjective 0xb6(**). The game uses the parts of speech for heuristic parsing, and the data bytes to save time and space when mapping a parsed sentence into a game action.
(* The logic for deciding which data byte belongs to which part of speech is baroque, and infodump doesn’t bother.)
(** And all nouns are 0x01 for some reason.)
It’s unusual to see code with all these single constructor sum types (not that it’s bad). More common ways would be using abstract types:
module Abstract_int() : sig
val to_int : t -> int
val of_int : int -> t
end = struct
type t = int
let of_int t = t
let to_int t = t
module Bit_number = Abstract_int()
module Bit_size = Abstract_int()
or don’t even bother with types, or rely on labels instead:
let f ~address = ..
but that doesn’t work as well when you want to label return values too.
Also it looks like you’re not writing .mli’s for your .ml’s, you should. Except for modules that are mostly types, it’s better style, it makes the interface of modules explicit rather than implicit and intertwined with code, and it allows the compiler to give warnings about dead code.
You should probably tweak the way you call the compiler: the defaults are not good. -strict-sequence, -safe-string, -strict-formats for instance should be the default, especially for new code. -bin-annot to get type-at-point functionality in the editor.
Also a number of useful warnings are not on by default, you might want to pass -w +a and turn off the annoying ones as they come (+a-40-42 to turn on everything except warnings 40 and 42 for instance).
Not very important, but Invalid_argument is usually (in the stdlib at least) the exception raised when preconditions are not satisfied, like index out of bounds, instead of Failure.
Thanks! I am very new at this and not at all clear on what are the best practices.
Out of curiosity; I’m guessing OCaml has something similar to Haskell’s
foldl. Why not use an existing tool? In Haskell,
accumulate_strings_loopcould be rewritten as
foldl (\acc i -> acc ++ (toString i)) "" [start..(max - 1)]which IMHO is more readable. Basically
foldris how you iterate in Haskell and I’m guessing in most functional languages.
You can remove most of the parenthesis to make it even cleaner :
foldl (\acc i -> acc ++ toString i) "" [start..max - 1]
Oh no, I’m starting to suffer the Haskell “make my code as concise as possible” syndrome…
Good idea. It does have left and right folds, and I will use them in later episodes. I decided to not try to introduce the higher-order functional programming idiom at this time.
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #2042
Just to be clear, your accumulate_strings_loop function operates on the half-open range [start, max) much like the C++ Standard Library algorithms that take an iterator range.
I’ve had my own fair share of struggles figuring out whether the endpoint is inclusive or exclusive (do we operate upon max?).
The convention I’ve adopted, for what it’s worth, is that, in C++, I always adhere to the half-open convention, and name them begin and end, in keeping with the Standard Library.
Most other languages I’ve used do not use the half-open convention in their standard libraries, so I prefer to name the endpoint for a half-open range too_far.
No one ever asks me why too_far got skipped. They used to often ask me why max got skipped when I used max.