You are atop the west wall of a great canyon, offering a marvelous view of the mighty Frigid River below. Across the canyon, the walls of the White Cliffs join the mighty ramparts of the Flathead Mountains. To the north, Aragain Falls may be seen, complete with rainbow. Even further upstream, the river flows out of a great dark cavern. To the west is an immense forest, stretching for miles. It seems possible to climb down into the canyon from here.
> go down
The walls of the river canyon may be climbable here. To the northeast is a narrow path.
> go northeast
Code for this and the next several episodes is at https://github.com/ericlippert/flathead/tree/blog9
We’re going to spend the next several episodes quoting the Z-machine specification (with some improvements in formatting) and decoding the instruction format. My goal here is to write code that clearly implements the specification. You can judge how well or poorly I do.
First, the general layout of the instruction:
A single Z-machine instruction consists of the following sections
- Opcode: 1 or 2 bytes
- (Types of operands): 1 or 2 bytes, in 4 or 8 2-bit fields
- Operands, between 0 and 8 of these: each 1 or 2 bytes
- (Store variable): 1 byte
- (Branch offset): 1 or 2 bytes
- (Text to print): An encoded string (of unlimited length)
Bracketed sections are not present in all opcodes. A few opcodes take both “store” and “branch”.
We will proceed from start to finish, parsing out the opcode, operand types, operands, store, branch and text. Let’s get started with the opcode.
Each instruction has a form: long, short, extended or variable
type opcode_form = | Long_form | Short_form | Variable_form | Extended_form
If the top two bits of the opcode are 11 the form is variable; if 10, the form is short. If the opcode is 190 and the version is 5 or later, the form is “extended”. Otherwise, the form is “long”.
Opcode 190 actually had no meaning in any version before 5, so it should not appear in any code. I can ignore the requirement to check the version number.
let decode_form address = let byte = read_byte address in match fetch_bits bit7 size2 byte with | 3 -> Variable_form | 2 -> if byte = 190 then Extended_form else Short_form | _ -> Long_form
Each instruction has … an operand count: 0OP, 1OP, 2OP or VAR.
I can’t start identifiers with numbers but I think this is just as clear:
type operand_count = | OP0 | OP1 | OP2 | VAR
Note that the so-called 2OP instructions are almost always but not guaranteed to have two operands! There is one instruction encoded as 2OP that has a variadic form.
- In short form, … if [bits 4 and 5 of the opcode byte are] 11 then the operand count is 0OP; otherwise, 1OP.
- In long form the operand count is always 2OP.
- In variable form, if bit 5 is 0 then the count is 2OP; if it is 1 then the count is VAR.
- In extended form, the operand count is VAR.
let decode_op_count address form = let b = read_byte address in match form with | Short_form -> if fetch_bits bit5 size2 b = 3 then OP0 else OP1 | Long_form -> OP2 | Variable_form -> if fetch_bit bit5 b then VAR else OP2 | Extended_form -> VAR
All right, we know the “form” of the instruction and we know the “count” of the instruction. But we still have not gotten out the opcode itself.
- In short form, … the opcode number is given in the bottom 4 bits.
- In long form … the opcode number is given in the bottom 5 bits.
- In variable form, … the opcode number is given in the bottom 5 bits.
- In extended form, … the opcode number is given in a second opcode byte.
Now what the spec does not say here clearly is: we are going to read 4, 5 or 8 bits, as specified above, but we need to figure out which of 100+ opcodes we’re talking about. The location of the bits depends on the form, but the meaning of the bits depends on the operand count. In fact the operation count is far more relevant here. It took me some time to puzzle out this section of the spec. The spec could more clearly say:
- In extended form the EXT opcode number is given in the following byte.
- If the operand count is 0OP then the 0OP opcode number is given in
the lower 4 bits.
- If the operand count is 1OP then the 1OP opcode number is given in
the lower 4 bits.
- if the operand count is 2OP then the 2OP opcode number is given in
the lower 5 bits
- If the operand count is VAR then the VAR opcode number is given in
the lower 5 bits
So, I’m going to need a list of all the opcodes. The convention used in the specification is to number the opcodes according to a combination of the operand count and the value of the byte which contains the opcode. Not every possible opcode is in use; I made an enumerated type of all the possible opcodes:
type bytecode = | OP2_1 | OP2_2 | OP2_3 | OP2_4 | OP2_5 | OP2_6 | OP2_7 | OP2_8 | OP2_9 | OP2_10 | OP2_11 | OP2_12 | OP2_13 | OP2_14 | OP2_15 ... and so on ...
And some code to implement the rules above:
let decode_opcode address form op_count = let b = read_byte address in match (form, op_count) with | (Extended_form, _) -> let maximum_extended = 29 in let ext = read_byte (inc_byte_addr address) in if ext > maximum_extended then ILLEGAL else ext_bytecodes.(ext) | (_, OP0) -> zero_operand_bytecodes.(fetch_bits bit3 size4 b) | (_, OP1) -> one_operand_bytecodes.(fetch_bits bit3 size4 b) | (_, OP2) -> two_operand_bytecodes.(fetch_bits bit4 size5 b) | (_, VAR) -> var_operand_bytecodes.(fetch_bits bit4 size5 b) in
These read the values out of arrays:
let one_operand_bytecodes = [| OP1_128; OP1_129; OP1_130; OP1_131; OP1_132; OP1_133; OP1_134; OP1_135; OP1_136; OP1_137; OP1_138; OP1_139; OP1_140; OP1_141; OP1_142; OP1_143 |]
and so on; I won’t list all of them. Major boring stuff.
One other thing: later on I’m going to need to know whether the next section of the instruction starts after one or two bytes of opcode:
let get_opcode_length form = match form with | Extended_form -> 2 | _ -> 1 in
We have gotten through the first thing on our list: we now know the form, argument count and opcode of the instruction. We haven’t quite made it through the first byte yet because we can still extract some operand typing information from the first byte, but I think that is enough for this episode. Next time on FAIC: we’ll see how to deduce the types of all the operands.
Porting your implementation to c# while maintaining as much as possible your functional approach is making me rediscover LINQ in so many ways. I think I’ve used
Aggregate(aka as Fold) more in these past few days than in all the time I’ve been coding since LINQ was introduced. It’s amazing how most of your recursive functions can be implemented using
Aggregate(with a little bit of
Zipwhen you need indexing).
Out of curiosity why did the LINQ implementers choose to change the method names and not use the ones normally used in functional languages; First -> Head, Select -> Map, Aggregate -> Fold, etc.?
I think LINQ got a lot of its terminology from database operations hence the “Query” part.
Select, Where, OrderBy
Yeah I guess that make sense. I was referring more to the IEnumerable extension methods that the query syntax ultimately translates to but it would be sort of strange if things were named completely different in query syntax and normal syntax.
I was not present when the names were picked but I’ve gotta tell you, the traditional function names are terrible. “Head” is of course orders of magnitude better than “car”, but “first” is better still. “Map”, “reduce” and “fold” are all awful; most map operations are *selecting a part of a thing*, so call it select. The analogy that “map” relates two things the same way that a symbol on a map relates to the thing in real life, that is an analogy that does not come easily to the casual reader. I hope it is clear that select-many must not be called “bind”. And I still to this day do not know why “fold” is called “fold”; plainly its purpose is to produce a summary. “Reduce” is better than “fold”, but “aggregate” is better still.
Propagating bad naming like that reinforces the undesirable notion that programming is an activity reserved to a priesthood that knows the sacred incantations, rather than an activity that encourages exploration by novices and experienced developers alike.
> And I still to this day do not know why “fold” is called “fold”.
The term folding comes from baking: The cake batter is the accumulator, and the flour is the list. You repeatedly sift in parts of the the flour (elements of the list), and fold them in to produce a new accumulator value (a batter with a different consistency). At the end you have a value that represents the fold operation applied to every element of the list.
Also, I completely made all of that up.
I’m totally stealing that.
I get that all the ‘default’ names are awful (altough it took me a while to agree because I needed to look at it with fresh eyes), but ‘map’ can be used for so much more than just ‘select’. I’ve always seen it as the ‘for-each’ of the FP world.
Maybe ‘with-each-do’ or ‘on-all-perform’ would be better?
‘Aggregate’ instead of ‘fold’ is beautiful though, I think.
A function maps values to other values. So when you extend that to taking a function and applying it to all values in a list, to generate another list, you have mapped the list. So calling your higher order function which does that “map” isn’t so strange.
I don’t agree that “select” is a better name than “map”. Select implies choosing some subset from a larger set, which makes sense in SQL when you’re choosing which columns to return, but it doesn’t make a lot of sense when what you’re really doing is applying a 1-to-1 mapping. If, for instance, I map a list of integers to the squares of those integers, I haven’t selected anything, I’ve applied the x => x*x function to all of them. Calling it “apply” would also be a good choice, I think.
I’m with you though on “fold” being a weird name. Then again, I’m kind of fond of Scala’s /: and operators (which if you squint and use your imagination, look kind of like dominos tipping from the left or right, respectively). So I can’t exactly claim to be devoted to giving everything the most obvious name.
I know this is an old post, but… calling Map, “Select”, because *most* maps are selects, seems insane to me. I don’t call driving, “getting groceries”, even though most of my driving is getting groceries. It smooths over initial confusion at the cost of confusion later, and its not even great at doing that: you either tell someone to accept the lambda syntax on blind faith, or you tell them that “select” applies the function to every element of the list/iterator/etc its operating on, at which point you’ve already explained Map. And “the function maps from the input to the output” sounds more natural to me than “the function selects from the input to the output”.
Anyways, thanks for the great articles 🙂
I assume you are not going to leave opcodes named OPx_yyy but later substitute actual descriptive names?
I am, for two reasons. First, because this is what the spec calls them, and I want to match the spec. Second, because there is not a 1-1 mapping from descriptive names to opcode numbers. Infocom changed the meanings over time.
That said, for most of them I could. But the code will be plenty easy enough to follow, don’t worry.