You are atop the west wall of a great canyon, offering a marvelous view of the mighty Frigid River below. Across the canyon, the walls of the White Cliffs join the mighty ramparts of the Flathead Mountains. To the north, Aragain Falls may be seen, complete with rainbow. Even further upstream, the river flows out of a great dark cavern. To the west is an immense forest, stretching for miles. It seems possible to climb down into the canyon from here.
> go down
The walls of the river canyon may be climbable here. To the northeast is a narrow path.
> go northeast
Code for this and the next several episodes is at https://github.com/ericlippert/flathead/tree/blog9
We’re going to spend the next several episodes quoting the Z-machine specification (with some improvements in formatting) and decoding the instruction format. My goal here is to write code that clearly implements the specification. You can judge how well or poorly I do.
First, the general layout of the instruction:
A single Z-machine instruction consists of the following sections
- Opcode: 1 or 2 bytes
- (Types of operands): 1 or 2 bytes, in 4 or 8 2-bit fields
- Operands, between 0 and 8 of these: each 1 or 2 bytes
- (Store variable): 1 byte
- (Branch offset): 1 or 2 bytes
- (Text to print): An encoded string (of unlimited length)
Bracketed sections are not present in all opcodes. A few opcodes take both “store” and “branch”.
We will proceed from start to finish, parsing out the opcode, operand types, operands, store, branch and text. Let’s get started with the opcode.
Each instruction has a form: long, short, extended or variable
type opcode_form = | Long_form | Short_form | Variable_form | Extended_form
If the top two bits of the opcode are 11 the form is variable; if 10, the form is short. If the opcode is 190 and the version is 5 or later, the form is “extended”. Otherwise, the form is “long”.
Opcode 190 actually had no meaning in any version before 5, so it should not appear in any code. I can ignore the requirement to check the version number.
let decode_form address = let byte = read_byte address in match fetch_bits bit7 size2 byte with | 3 -> Variable_form | 2 -> if byte = 190 then Extended_form else Short_form | _ -> Long_form
Each instruction has … an operand count: 0OP, 1OP, 2OP or VAR.
I can’t start identifiers with numbers but I think this is just as clear:
type operand_count = | OP0 | OP1 | OP2 | VAR
Note that the so-called 2OP instructions are almost always but not guaranteed to have two operands! There is one instruction encoded as 2OP that has a variadic form.
- In short form, … if [bits 4 and 5 of the opcode byte are] 11 then the operand count is 0OP; otherwise, 1OP.
- In long form the operand count is always 2OP.
- In variable form, if bit 5 is 0 then the count is 2OP; if it is 1 then the count is VAR.
- In extended form, the operand count is VAR.
let decode_op_count address form = let b = read_byte address in match form with | Short_form -> if fetch_bits bit5 size2 b = 3 then OP0 else OP1 | Long_form -> OP2 | Variable_form -> if fetch_bit bit5 b then VAR else OP2 | Extended_form -> VAR
All right, we know the “form” of the instruction and we know the “count” of the instruction. But we still have not gotten out the opcode itself.
- In short form, … the opcode number is given in the bottom 4 bits.
- In long form … the opcode number is given in the bottom 5 bits.
- In variable form, … the opcode number is given in the bottom 5 bits.
- In extended form, … the opcode number is given in a second opcode byte.
Now what the spec does not say here clearly is: we are going to read 4, 5 or 8 bits, as specified above, but we need to figure out which of 100+ opcodes we’re talking about. The location of the bits depends on the form, but the meaning of the bits depends on the operand count. In fact the operation count is far more relevant here. It took me some time to puzzle out this section of the spec. The spec could more clearly say:
- In extended form the EXT opcode number is given in the following byte.
- If the operand count is 0OP then the 0OP opcode number is given in
the lower 4 bits.
- If the operand count is 1OP then the 1OP opcode number is given in
the lower 4 bits.
- if the operand count is 2OP then the 2OP opcode number is given in
the lower 5 bits
- If the operand count is VAR then the VAR opcode number is given in
the lower 5 bits
So, I’m going to need a list of all the opcodes. The convention used in the specification is to number the opcodes according to a combination of the operand count and the value of the byte which contains the opcode. Not every possible opcode is in use; I made an enumerated type of all the possible opcodes:
type bytecode = | OP2_1 | OP2_2 | OP2_3 | OP2_4 | OP2_5 | OP2_6 | OP2_7 | OP2_8 | OP2_9 | OP2_10 | OP2_11 | OP2_12 | OP2_13 | OP2_14 | OP2_15 ... and so on ...
And some code to implement the rules above:
let decode_opcode address form op_count = let b = read_byte address in match (form, op_count) with | (Extended_form, _) -> let maximum_extended = 29 in let ext = read_byte (inc_byte_addr address) in if ext > maximum_extended then ILLEGAL else ext_bytecodes.(ext) | (_, OP0) -> zero_operand_bytecodes.(fetch_bits bit3 size4 b) | (_, OP1) -> one_operand_bytecodes.(fetch_bits bit3 size4 b) | (_, OP2) -> two_operand_bytecodes.(fetch_bits bit4 size5 b) | (_, VAR) -> var_operand_bytecodes.(fetch_bits bit4 size5 b) in
These read the values out of arrays:
let one_operand_bytecodes = [| OP1_128; OP1_129; OP1_130; OP1_131; OP1_132; OP1_133; OP1_134; OP1_135; OP1_136; OP1_137; OP1_138; OP1_139; OP1_140; OP1_141; OP1_142; OP1_143 |]
and so on; I won’t list all of them. Major boring stuff.
One other thing: later on I’m going to need to know whether the next section of the instruction starts after one or two bytes of opcode:
let get_opcode_length form = match form with | Extended_form -> 2 | _ -> 1 in
We have gotten through the first thing on our list: we now know the form, argument count and opcode of the instruction. We haven’t quite made it through the first byte yet because we can still extract some operand typing information from the first byte, but I think that is enough for this episode. Next time on FAIC: we’ll see how to deduce the types of all the operands.