This is a small room with passages to the east, northeast and south, and a forbidding hole leading west. Bloodstains and deep scratches (perhaps made by an axe) mar the walls. A troll, brandishing a bloody axe, blocks all passages out of the room.
> kill the troll with the elvish sword
The troll is struck on the arm; blood begins to trickle down.
The axe crashes against the rock, throwing sparks!
> again
The fatal blow strikes the troll square in the heart: he dies. As the troll breathes his last breath, a cloud of sinister black fog envelops him; when it lifts, the carcass is gone.
> go east
In this episode we’re going to look at one of the most powerful feature of OCaml: pattern matching.
Code for this episode can be found at https://github.com/ericlippert/flathead/tree/blog5
Recall that our attack on the problem of decoding compressed zstrings into real strings is to implement a state machine. That is, given a current character and current state, what is the new state and the new output — the output being a fragment of a string. Here’s the code:
let process_zchar (Zchar zchar) state = match (zchar, state) with | (1, Alphabet _) -> ("", abbrev0) | (2, Alphabet _) -> ("", abbrev32) | (3, Alphabet _) -> ("", abbrev64) | (4, Alphabet _) -> ("", alphabet1) | (5, Alphabet _) -> ("", alphabet2) | (6, Alphabet 2) -> ("", Leading) | (_, Alphabet a) -> (alphabet_table.(a).(zchar), alphabet0) | (_, Abbrev Abbreviation a) -> let abbrv = Abbreviation (a + zchar) in let addr = abbreviation_zstring story abbrv in let str = read story addr in (str, alphabet0) | (_, Leading) -> ("", (Trailing zchar)) | (_, Trailing high) -> let s = string_of_char (Char.chr (high * 32 + zchar)) in (s, alphabet0)
Wow, there is a lot going on here. This is a very violent introduction to the matching syntax. But we’ll get through it!
First off, I’ve created yet another simple wrapper type, this one around characters extracted from zstring encodings.
OCaml allows you to create tuples by simply making a parenthesized, comma-separated list of values. So (zchar, state)
is just a newly-created two-element tuple containing the character and the state. Similarly, on the next line, ("", abbrev0)
is a two-element tuple containing a string and a state.
match
means “I’m going to give you an expression and a pile of bar-separated patterns that might match this expression. Go down the list and find the first pattern that matches this expression; the thing to the right of the arrow is the value I want you to produce for this case.” It’s sort of like an “if-then” and sort of like a switch in C#, but it operates on expressions and has a very powerful pattern matching syntax. (An expression-matching syntax is proposed for C# 7 and I would love to have it!)
Let’s look at the patterns. Each pattern is of course a tuple pattern, so we ask “does the first pattern in the tuple pattern match the first element in the tuple? (that is, the zchar.) Does the second pattern in the tuple pattern match the second element in the tuple? (that is, the state)” If we get both matches, then the pattern as a whole matches.
Let’s look at the individual patterns. The patterns 1, 2, 3, 4, 5, 6 are pretty obvious; these are just ordinary “switch” cases. If the zchar equals that value then the pattern matches.
The pattern Alphabet _
means “match any state that was constructed with the Alphabet constructor, and I don’t care what value was given to the constructor”.
So the first match case means “if the zchar is 1 and the state is any alphabet state then the result is an empty string and the new state is abbrev0.
Now that you understand that, the next few patterns should be easy to understand. Right up to
| (_, Alphabet a) -> (alphabet_table.(a).(zchar), alphabet0)
The _ pattern means “match any zchar value”. The Alphabet a
pattern means “match any state created with the Alphabet constructor, and I would like to name the value that was passed to the constructor a
. This value is of course an integer, so we can then use it as the index into an array to fetch the string we want. (And of course the new state goes back to alphabet 0 for processing the next character.)
This one is a little bit interesting:
| (_, Abbrev Abbreviation a) -> let abbrv = Abbreviation (a + zchar) in let addr = abbreviation_zstring story abbrv in let str = read story addr in (str, alphabet0)
Notice that we are unwrapping two levels of wrapping here. We want a state that was constructed with the Abbrev constructor, whose argument was constructed with the Abbreviation constructor, which was constructed with an integer, call it a
.
Notice also that in this code I’m assuming that we have a method read
that can take a Zstring
address and produce a string, but this is precisely what we are attempting to write! I’ll leave implementation of this method until next time; if you think that it is going to have to somehow be recursive with respect to process_zchar
, you’re right.
Next time on FAIC: let’s put together everything we’ve seen so far in string decoding.
Is `Zchar` really that necessary here? Maybe it’s because it’s used extensively later on but I don’t really see the danger, as the code stands in *Blog5*, in using an `Int` in what appears to be one single nested function.
Good insight.
It really is not, for exactly the reason you mention. I’ve been using these wrapper types to clearly demarcate the data types expected to be passed between logical components, but here this wrapper type is used only inside a particular module. I added this wrapper type more out of trying to continue a good habit, rather than any sort of necessity.
That said: the Z-machine has a fairly complicated character model. Later versions need to be able to track keypresses of non-printable characters, and so there is some justification there for having a type specifically for characters meaningful in the Z-machine. So this might come in handy later. Or not. Hard to say.
I’d be interested in hearing your thoughts on first-citizen language pattern matching in general, seeing as it appears to be showing up more and more. I’ve used it just a little bit, in Racket (a Scheme derivative) and in Rust. From what I can tell it’s really just a more declarative approach to features which already exist, and that often anything more than a very simple case becomes a tangled mess of patterns. Is that unfair?
It certainly is moving what used to be a bunch of imperative checks into a somewhat more declarative form.
As for having a tangled mess: my goal is that the code clearly reflect the complexity inherent in the domain. If you were to read the specification for the zstring format you’d see that there was pretty much one sentence in English for each of the cases I’m matching. If this code is long and complicated, that’s because the domain is long and complicated.
One of the things I like about patterns in OCaml is that if they are a tangled mess, at least the compiler tells me when I’ve likely got it wrong! The compiler will warn if there are cases that can be shown to be unreachable, or if there are values that are matched by no pattern. That’s worth the price of admission right there.
In Rust, pattern matching using `match` requires you to define enough patterns to cover all possible cases. This is especially useful with enums (which are more “discriminated unions” than enums), since you’ll get a compile-time error if you add a variant to an enum and forget to cover it in one of your `match` expressions (which is only relevant when you test all variants individually, i.e. the `match` doesn’t use a wildcard pattern on that enum).
In other languages, such as Haskell, your code will compile even if the match is not exhaustive; if you match on an expression that doesn’t match any of the patterns, a run-time error is raised.
Also, again in Rust, pattern matching (using either `match`, `if let` or `while let`, or a plain `let` if the enum only has one variant) is the only way to access the data members of enum variants (though some libraries may expose data members through methods). So for this specific case, pattern matching is necessary.
I have to admit that in code like this:
| (_, Abbrev Abbreviation a) ->
let abbrv = Abbreviation (a + zchar) in
I start to lose track of what the difference is between Abbrev, Abbreviation, and abbrv.
Do you have syntactic highlighting in your editor to keep track of the semantics of each of those slightly different forms of the same word?
If I understand correctly (everything I know about OCaml I’ve learned from this blog series….), Abbreviation is a named constructor function for the abbreviation_number type, while Abbrev is a constructor function for the string_state type. So semantically, both are functions, and do the same thing for two different types – so it doesn’t seem like syntax highlighting would help much. abbrv is a local variable, so it’s semantically completely different.
It seems like if you need to add a constructor for a third or fourth type which also relates to abbreviations, you’re going to run out of different forms of the word “abbreviation” pretty quickly. Are there naming conventions in OCaml for dealing with this?
I am not entirely convinced that this was a good move for abbreviations, this nesting. As you note, it seems like it might be more confusing than helpful.
We’ll see this pattern again when I need to make a distinction between a reference to storage, which might be a local variable, and a number representing an index into the table of local variables. There I think it makes good sense, but this one, I’m still on the fence about a bit.
Pingback: Round room | Fabulous adventures in coding
Pingback: Another troll room | Fabulous adventures in coding
Pingback: Cellar | Fabulous adventures in coding