This is a small room with passages to the east, northeast and south, and a forbidding hole leading west. Bloodstains and deep scratches (perhaps made by an axe) mar the walls. A troll, brandishing a bloody axe, blocks all passages out of the room.
> kill the troll with the elvish sword
The troll is struck on the arm; blood begins to trickle down.
The axe crashes against the rock, throwing sparks!
The fatal blow strikes the troll square in the heart: he dies. As the troll breathes his last breath, a cloud of sinister black fog envelops him; when it lifts, the carcass is gone.
> go east
In this episode we’re going to look at one of the most powerful feature of OCaml: pattern matching.
Code for this episode can be found at https://github.com/ericlippert/flathead/tree/blog5
Recall that our attack on the problem of decoding compressed zstrings into real strings is to implement a state machine. That is, given a current character and current state, what is the new state and the new output — the output being a fragment of a string. Here’s the code:
let process_zchar (Zchar zchar) state = match (zchar, state) with | (1, Alphabet _) -> ("", abbrev0) | (2, Alphabet _) -> ("", abbrev32) | (3, Alphabet _) -> ("", abbrev64) | (4, Alphabet _) -> ("", alphabet1) | (5, Alphabet _) -> ("", alphabet2) | (6, Alphabet 2) -> ("", Leading) | (_, Alphabet a) -> (alphabet_table.(a).(zchar), alphabet0) | (_, Abbrev Abbreviation a) -> let abbrv = Abbreviation (a + zchar) in let addr = abbreviation_zstring story abbrv in let str = read story addr in (str, alphabet0) | (_, Leading) -> ("", (Trailing zchar)) | (_, Trailing high) -> let s = string_of_char (Char.chr (high * 32 + zchar)) in (s, alphabet0)
Wow, there is a lot going on here. This is a very violent introduction to the matching syntax. But we’ll get through it!
First off, I’ve created yet another simple wrapper type, this one around characters extracted from zstring encodings.
OCaml allows you to create tuples by simply making a parenthesized, comma-separated list of values. So
(zchar, state) is just a newly-created two-element tuple containing the character and the state. Similarly, on the next line,
("", abbrev0) is a two-element tuple containing a string and a state.
match means “I’m going to give you an expression and a pile of bar-separated patterns that might match this expression. Go down the list and find the first pattern that matches this expression; the thing to the right of the arrow is the value I want you to produce for this case.” It’s sort of like an “if-then” and sort of like a switch in C#, but it operates on expressions and has a very powerful pattern matching syntax. (An expression-matching syntax is proposed for C# 7 and I would love to have it!)
Let’s look at the patterns. Each pattern is of course a tuple pattern, so we ask “does the first pattern in the tuple pattern match the first element in the tuple? (that is, the zchar.) Does the second pattern in the tuple pattern match the second element in the tuple? (that is, the state)” If we get both matches, then the pattern as a whole matches.
Let’s look at the individual patterns. The patterns 1, 2, 3, 4, 5, 6 are pretty obvious; these are just ordinary “switch” cases. If the zchar equals that value then the pattern matches.
Alphabet _ means “match any state that was constructed with the Alphabet constructor, and I don’t care what value was given to the constructor”.
So the first match case means “if the zchar is 1 and the state is any alphabet state then the result is an empty string and the new state is abbrev0.
Now that you understand that, the next few patterns should be easy to understand. Right up to
| (_, Alphabet a) -> (alphabet_table.(a).(zchar), alphabet0)
The _ pattern means “match any zchar value”. The
Alphabet a pattern means “match any state created with the Alphabet constructor, and I would like to name the value that was passed to the constructor
a. This value is of course an integer, so we can then use it as the index into an array to fetch the string we want. (And of course the new state goes back to alphabet 0 for processing the next character.)
This one is a little bit interesting:
| (_, Abbrev Abbreviation a) -> let abbrv = Abbreviation (a + zchar) in let addr = abbreviation_zstring story abbrv in let str = read story addr in (str, alphabet0)
Notice that we are unwrapping two levels of wrapping here. We want a state that was constructed with the Abbrev constructor, whose argument was constructed with the Abbreviation constructor, which was constructed with an integer, call it
Notice also that in this code I’m assuming that we have a method
read that can take a
Zstring address and produce a string, but this is precisely what we are attempting to write! I’ll leave implementation of this method until next time; if you think that it is going to have to somehow be recursive with respect to
process_zchar, you’re right.
Next time on FAIC: let’s put together everything we’ve seen so far in string decoding.