Life, part 38

Here we go again!

Fellow BASIC enthusiast Jeff “Coding Horror” Atwood, of Stack Overflow and Discourse fame, has started a project to translate the programs in the 1978 classic BASIC Computer Games into more modern languages. I never had a copy of this one — my first computer book was Practise Your BASIC — but these programs are characteristic of the period and I am happy to help out.

graphic of page

1978 was of course well into the craze of writing Life simulators for home personal computers. In this episode I’ll break down the original program assuming that you’re not a BASIC old-timer like me; this language has a lot of quirks. You can read the commentary from the book here, and f you want to run the original program yourself the copy here is a valid Vintage BASIC program.

Let’s get into it!

2 PRINT TAB(34);"LIFE"
4 PRINT TAB(15);"CREATIVE COMPUTING  MORRISTOWN, NEW JERSEY"
6 PRINT: PRINT: PRINT
8 PRINT "ENTER YOUR PATTERN:"
  • All lines are numbered whether they need to be or not. Line numbering is often used to indicate program structure; we’ll come back to this in a bit.
  • Lines are not one-to-one with statements! A single line may contain several statements separated by colons.
  • PRINT takes a semicolon-separated list of zero or more expressions.
  • PRINT outputs a newline by default; as we’ll see presently, there is a way to suppress the newline.
  • TAB(x) does not produce a string; it moves the “print cursor” to column x. In many versions of BASIC, TAB can only appear as an argument to PRINT.
9 X1=1: Y1=1: X2=24: Y2=70
10 DIM A(24,70),B$(24)
20 C=1
  • Variable names are typically one or two characters.
  • 24 x 70 is the size of the Life board that we’re simulating here; this was quite a large screen by 1978 standards. I owned both a Commodore 64 and a Commodore PET with 40×25 character screens; it was possible to get an 80-column PET but they were rare.
  • Note that the authors have made the bizarre choice throughout that (X, Y) coordinates refer to (line, column) and not the more standard (column, line) interpretation. Normally we think of X as being the left-right coordinate, not the top-bottom coordinate.
  • Single-valued variables need not be declared.
  • Array-valued variables are “dimensioned”, and the dimensions typically must be literal constants. Here we have a two-dimensional array and a one-dimensional array.
  • Anything that is string-valued has a $ suffix, so B$ is an array of strings.
  • Arrays and strings are both indexed starting from 1.
  • As we know from early episodes in this series, a naïve implementation needs to keep track of both the “current” and “next” state. In a memory-constrained environment such as can be assumed by a 1978 BASIC program’s author, we need to keep all that in the same array. Spoiler alert: the convention used by this program for the contents of array A is:
    • 0 means cell is dead, will stay dead.
    • 1 means cell is alive, will stay alive.
    • 2 means cell is alive, will die.
    • 3 means cell is dead, will come alive.
  • You might have noticed that the program is formatted to have very few unnecessary spaces. Having become accustomed to vertical and horizontal whitespace being used to reduce eyestrain and guide the reader, it looks dense and hard to read. This was not a stylistic choice though; it was imposed by the limitations of the technology of the time. Line lengths were constrained, often to the width of the screen, and when you have only 40 or 80 characters of screen, unnecessary spaces are bad. But more interesting from my perspective as a compiler writer is that implementations of BASIC often tokenized as the user typed in the program, and then stored only the tokens in memory, not the original source code as it was typed. When the program was listed back, the program printer reconstituted the program from the token stream which had discarded the spaces. Pre .NET versions of VB did this!
30 INPUT B$(C)
40 IF B$(C)="DONE" THEN B$(C)="": GOTO 80
50 IF LEFT$(B$(C),1)="." THEN B$(C)=" "+RIGHT$(B$(C),LEN(B$(C))-1)
60 C=C+1
70 GOTO 30
  • INPUT takes a string from the console and places it in the given variable.
  • Arrays are indexed using parentheses.
  • This is a “while” loop but most BASICs did not have loop structures beyond FOR/NEXT, or even ELSE clauses for IF/THEN statements. You had to write GOTO statements to build those control flows, as shown here. Notice that all the colon-separated statements execute if the condition of the IF is met; otherwise we continue on to the next statement. (I was fortunate to have access to an original PET with Waterloo Structured BASIC which did have while loops, though I did not understand the point of them when I was in elementary school. Many years later I ended up working with some of the authors of Waterloo BASIC in my first internships at WATCOM, though I did not realize it at the time! The whole saga of Waterloo Structured BASIC and the SuperPET hardware is told here.)
  • LEFT$, RIGHT$ and LEN do what they say on the tin: give you the left substring of given length, right substring of given length, and length of a string. Strings are not directly indexable.
  • + is both the string concatenation operator and the arithmetic addition operator.
  • It seems clear that we are inputting strings and sticking them in an array until the user types DONE, but why are we checking whether the first character in the string is a period, and replacing it with a space? It is because in some versions of BASIC, INPUT strips leading spaces, but they are necessary when entering a Life pattern.
  • Did you notice that nothing whatsoever was explained to the user of the program except “enter your pattern”? Enter it how? Why should the user know that spaces are dead, leading spaces are soft, a period in the first place is a hard space but a live cell anywhere else, and that you type DONE when you’re done? The assumption of the authors is that the only person running the code is the person who typed it in. “You should explain as little as possible” is not a great attitude to drill into beginner programmers.
  • Notice that we have no bounds checking. B$ was dimensioned to 24 elements; if the user types in more than 24 lines in the pattern, the program crashes. Similarly we have no checks on the lengths of the strings themselves. A whole generation of young programmers was taught from their very first lesson that crashes are the user’s fault for doing something wrong.
  • Our program state by the time we break out of the loop is: B$ has C lines in it, and B$(C) is an empty string.
80 C=C-1: L=0
90 FOR X=1 TO C-1
100 IF LEN(B$(X))>L THEN L=LEN(B$(X))
110 NEXT X
  • Since the last line is always blank, we are reducing the line count C by one.
  • The maximum line length L is initialized to zero, unnecessarily. I prefer an unnecessary but clear initialization to the lack thereof. But as we’ll see in a moment, some variables are just assumed to be initialized to zero, and some are initialized twice needlessly. More generally, programs that were designed to teach children about programming were chock full of sloppy inconsistencies and this is no exception.
  • FOR/NEXT is the only structured loop in many early BASICs. The loop variable starts at the given value and executes the body unless the loop variable is greater than the final value. Some versions of the language also had an optional STEP clause, and some would even keep track of whether the loop was counting up or down. Fancy!
  • Plainly the purpose of the FOR/NEXT loop here is to find the maximum line length of the pattern input by the user, but it appears to have a bug; we have strings in B$ indexed from 1 through C, but we are skipping the length check on B$(C). The subtraction here appears to be a mis-edit; perhaps the C=C-1 was moved up, but the developer forgot to adjust the loop condition. The bug only affects inputs where the last line is also the longest.
  • Is it inefficient to compute LEN(B$(X)) twice in the case where the maximum length L needs to be updated? Many versions of BASIC used length-prefixed strings (as do all versions of VB and all .NET languages) and thus computing length is O(1). When I first learned C as a teenager it struck me as exceedingly weird that there was no out-of-the-box system for length-prefixing a string. And it still does.
120 X1=11-C/2
130 Y1=33-L/2
140 FOR X=1 TO C
150 FOR Y=1 TO LEN(B$(X))
160 IF MID$(B$(X),Y,1)<>" " THEN A(X1+X,Y1+Y)=1:P=P+1
170 NEXT Y
180 NEXT X
  • There are no comments in this program; it would be nice to explain that what we’re doing here is trying to center the pattern that the user has just typed in to the interior of a 24 x 70 rectangle. (X1, Y1) is the array coordinate of the top left corner of the pattern as it is being copied from B$ to A; spaces are kept as zero, and non-spaces become 1. This automatic centering is a really nice feature of this program.
  • Once again we have no bounds checking. If L is greater than 67 or C is greater than 23, bad things are going to happen when we index into A. (Though if C is greater than 23, we might have already crashed when filling in B$.)
  • We already initialized X1 and Y1; we have not read their values at any point before they are written for a second time. By contrast, the population count P is accessed for the first time here and is assumed to be initialized to zero. Again, there is some amount of sloppiness going on here that could have been easily removed in code review.
200 PRINT:PRINT:PRINT
210 PRINT "GENERATION:";G,"POPULATION:";P;: IF I9 THEN PRINT "INVALID!";
215 X3=24:Y3=70:X4=1: Y4=1: P=0
220 G=G+1
225 FOR X=1 TO X1-1: PRINT: NEXT X
  • The input and state initialization phase is done. This is the start of the main display-and-compute-next-generation loop of our Life algorithm. The subtle indication is that the line numbering just skipped from 180 directly to 200, indicating that we’re starting a new section of the program.
  • Notice that two of our PRINT statements here end in semicolons. This suppresses the newline at the end. Notice also that the separator between G and “POPULATION:” is a comma, which instructs PRINT to tab out some whitespace after converting G to a string and printing it.
  • I9, whatever it is, has not been initialized yet and we are depending on it being zero. There is no Boolean type; in BASIC we typically use zero for false and -1 for true. (Do you see why -1 for true is arguably preferable to 1?)
  • We know that (X1, Y1) is the top left corner of the “might be living” portion of the pattern inside array A. (X2, Y2) and (X3, Y3) both appear to be the bottom right corner of the array, both being (24, 70) at this point, and (X4, Y4) is (1, 1), so it is likely another “top left” coordinate of some sort. Maybe? Let’s see what happens.
  • We reset the living population counter P to zero and increase the generation count G by one.
  • We then print out X1-1 blank lines. This implementation is quite smart for a short 1978 BASIC program! It is tracking what subset of the 24×70 grid is “maybe alive” so that it does not have to consider the entire space on each generation.
  • We’re continuing with the pattern established so far that X and Y are loop variables. Thankfully, this pattern is consistent throughout the program.
  • The assignment of P on line 215 is redundant; we’re going to assign it zero again on line 309 and there is no control flow on which it is read between those two lines.
230 FOR X=X1 TO X2
240 PRINT
250 FOR Y=Y1 TO Y2
253 IF A(X,Y)=2 THEN A(X,Y)=0:GOTO 270
256 IF A(X,Y)=3 THEN A(X,Y)=1:GOTO 261
260 IF A(X,Y)<>1 THEN 270
261 PRINT TAB(Y);"*";
262 IF X<X3 THEN X3=X
264 IF X>X4 THEN X4=X
266 IF Y<Y3 THEN Y3=Y
268 IF Y>Y4 THEN Y4=Y
270 NEXT Y
290 NEXT X
  • This is where things start to get interesting; this nested loop does a lot of stuff.
  • We are looping from (X1, Y1) to (X2, Y2), so this establishes the truth of our earlier assumption that these are the upper left and bottom right coordinates of the region of array A that could have living cells. However, note that the authors missed a trick; they set up (X1, Y1) correctly in the initialization phase, but they could have also set (X2, Y2) at that time as well.
  • We start with a PRINT because all the PRINTs in the inner loop are on the same line; we need to force a newline.
  • We update from the current state to the next state; as noted above, if current state is 2 then we were alive but we’re going to be dead, so we set the state to 0. Similarly, if current state is 3 then we were dead but are coming alive, so the state is set to 1.
  • It’s not clear to me why the test on line 260 is a not-equal-one instead of an equal-zero. There are only four possible values; we’ve considered two of them. It’s not wrong, it’s just a little strange.
  • In all cases where the cell is dead we GOTO 270 which is NEXT Y. Though some BASIC dialects did allow multiple NEXT statements for the same loop, it was considered bad form. The right thing to do was to GOTO the appropriate NEXT if you wanted to “continue” the loop.
  • Notice that there’s a syntactic sugar here. IF A(X,Y)<>1 THEN 270 elides the “GOTO”.
  • If we avoided skipping to the bottom of the loop then the cell is alive, so we tab out to the appropriate column and print it. Then we finally see the meaning of (X3, Y3) and (X4, Y4); as surmised, they are the upper left and bottom right corners of the “possibly alive” sub-rectangle of the next generation but I guessed backwards which was which. (X1, Y1), (X2, Y2) are the sub-rectangle of the current generation.
  • The line numbering pattern seems to have gone completely off the rails here and in the next section. This often indicates that the author got the code editor into a state where they had to introduce a bunch of code they did not initially plan for and did not want to renumber a bunch of already-written code that came later. The convention was to number lines on the tens, so that if you needed to come back and insert code you forgot, you had nine places in which to do it. Were I writing this code for publication, I would have taken the time to go back through it and renumber everything to a consistent pattern, but it really was a pain to do so with the editors of the era.
295 FOR X=X2+1 TO 24: PRINT: NEXT X
299 X1=X3: X2=X4: Y1=Y3: Y2=Y4
301 IF X1<3 THEN X1=3:I9=-1
303 IF X2>22 THEN X2=22:I9=-1
305 IF Y1<3 THEN Y1=3:I9=-1
307 IF Y2>68 THEN Y2=68:I9=-1
  • We’ve now processed the entire “currently maybe alive” rectangle so we print out the remaining blank lines to fill up the screen.
  • We copy (X3, Y3) and (X4, Y4), the “next generation” sub-rectangle to (X1, Y1), (X2, Y2) and it becomes the current generation.
  • Here we do something really quite clever that none of the implementations I looked at in my previous series handled. The authors of this algorithm have implemented a “rectangle of death” as a border of the array; that is a pretty standard way of handling the boundary condition. But what I really like is: they detect when a living cell hits the boundary and set flag I9 to true to indicate that we are no longer playing by infinite-grid Life rules! This flag is never reset, so you always know when you are looking at the UI that this is possibly not the true evolution of your initial pattern.
309 P=0
500 FOR X=X1-1 TO X2+1
510 FOR Y=Y1-1 TO Y2+1
520 C=0
530 FOR I=X-1 TO X+1
540 FOR J=Y-1 TO Y+1
550 IF A(I,J)=1 OR A(I,J)=2 THEN C=C+1
560 NEXT J
570 NEXT I
580 IF A(X,Y)=0 THEN 610
590 IF C<3 OR C>4 THEN A(X,Y)=2: GOTO 600
595 P=P+1
600 GOTO 620
610 IF C=3 THEN A(X,Y)=3:P=P+1
620 NEXT Y
630 NEXT X
  • Finally, we’ve got to compute the next generation. Note that we had a corresponding sudden increase in line number to mark the occasion.
  • We reset the population counter to zero and we loop over the currently-maybe-alive rectangle expanded by one cell in each direction, because the dead cells on the edge might become alive.
  • Variable C before was the number of valid lines in B$, our string array. Now it is the number of living neighbours of cell (X, Y). Even when restricted to two-character variables, they are in fact plentiful and there is no need to confusingly reuse them.
  • We count the number of living cells surrounding (X, Y) including (X, Y) itself, remembering that “is alive, stays alive” is 1, and “is alive, dies” is 2. Once we have the count then we have our standard rules of Life: if the cell is currently dead and the neighbour count is 3 then it becomes alive (3). If it is currently alive and the neighbour count including itself is not 3 or 4 then it becomes dead (2). Otherwise it stays as either 0 or 1.
  • We have a GOTO-to-GOTO bug here. That GOTO 600 could be replaced with GOTO 620 and save a statement.
635 X1=X1-1:Y1=Y1-1:X2=X2+1:Y2=Y2+1
640 GOTO 210
650 END
  • We did not track whether any cell on the border became alive, so the code makes the conservative assumption that the maybe-living-cells-in-here rectangle needs to be grown one cell on each side. Smart… or… is it?
  • Take a look at the logic on line 635, and then compare it to the looping constructs on lines 500 and 510. We loop from x1-1 to x2+1; we nowhere else read or write x1 or x2, and as soon as the loop is done, we reassign x1 to x1-1 and x2 to x2+1. It would have made more sense to do the increments and decrements first, and then do the loops!
  • Our program then goes into an infinite loop. This was common in BASIC programs of the time; when you wanted it to end, you just pressed RUN-STOP. I mean, what else are you going to do? Prompt the user? Poll the keyboard? Just run and the user will stop you when they want it stopped.
  • Some dialects of BASIC required an END statement even if it was unreachable.
  • Notice that there was never any “clear the screen” control here. It just constantly prints out new screens and hopes for the best as the old screen scrolls off the top!

I hope you enjoyed that trip down memory lane as much as I did. I am extremely happy to have, you know, while loops, in the languages I use every day. But I do miss the simplicity of BASIC programming some days. It is also quite instructive to see how you can implement all this stuff out of IF X THEN GOTO when you need to — and when you’re writing a compiler that takes a modern language to a bytecode or machine code, you really do need to.

Next time on FAIC: I’ll write this algorithm in idiomatic modern C# and we’ll see how it looks.

20 thoughts on “Life, part 38

  1. If you were really trying to save a few cycles, it could make sense to initialize a variable early. Typically in microsoft basics, the variables are stored in the runtime order they are encountered. When needed for evaluation a linear search is done. So an often-used variable would benefit from being first in storage.
    I think it is Atari basic which also tokenizes variables (and presumably looks them up quicker). Which has downsides of its own, of course. Apparently it could fail to forget about variable (number)s that were no longer in the program…

    • I worked on VBScript and JavaScript at Microsoft back in the 1990s; for locals we had the compiler do a name lookup pass and resolve them to indices early, but due to the dynamic nature of the languages we often had situations where we had to resort to lookups at runtime. Those lookups were done via hashing to attain O(1) time, but there was still a problem with hash collisions; on lookups which caused collisions we moved the most recently accessed collision to the front of the list so that if we were accessing it in a loop, the second time we would resolve the collision more quickly. So yes, this technique lived on for some time, but we tried to do it on behalf of the user rather than making the user have to make weird micro-optimizations to their code.

  2. I was able to easily port this to TI Extended BASIC (and Myarc Extended BASIC II) on the TI-99/4A. The biggest difference was that the statement separators in a line were double-colons instead of single and LEFT$/MID$/RIGHTS$ are all done with SEG$. That was a fun trip down memory lane!

  3. Are “Peek” and “Poke” part of this?

    As a ten year old, I fell in love with programming with the following on Atari with Basic cartridge

    10 print “Hello”
    20 goto 10
    Run

    That was it. I was hooked!!

    • PEEK and POKE are not part of BASIC proper, but rather ways to access the underlying features of the hardware in a non-portable manner. When I was a Commodore 64 programmer I had only the vaguest understanding of what stuff was in memory and how you might access it via PEEKing — that was arcane lore. Of course anyone can look up the entire memory map in a few seconds of searching on the internet today.

      I will remember forever that addresses 53280 and 53281 stored the screen background colours. At the time I had no idea why these particular numbers were chosen; in hindsight of course it is completely obvious that this is right near the start of memory block 0xD000, and so that block must be some sort of video memory. (And indeed, a quick check of the internet says that the 1KB block starting at D000 was colour and sprite data.)

      I too delighted in making Radio Shack computers on display at the mall produce an endless loop of random stuff. I certainly had no idea that this interest would lead me to work on a BASIC compiler a couple decades later. 🙂

  4. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #3173

  5. “When the program was listed back, the program printer reconstituted the program from the token stream which had discarded the spaces. Pre .NET versions of VB did this!”

    I remember Excel 95(?), when they experimented with localizing VBA keywords (“Wenn” instead of “If” in the German version, etc.). That’s when I realized as a kid that the source code is probably not stored as plain text, because that would break compatibility across different language versions.

    • That was an amazing feature that never really gelled. It was a very well-meaning experiment; an experiment where you learn what doesn’t work is also a success. We learned that English keywords are not actually a high barrier to adoption of a language, but having the same program render in many different ways can be confusing and drive dissatisfaction.

      You are totally correct that the keywords were localized but you might not know that the experiment went beyond keywords. There was also at that time a lot of gear put inside IDispatch for localizing late-bound objects, so that if you had, say, a list object then it would have Count and Items properties in English, but Zahl and Elemente properties in German, and so on. I had to write (and test) a lot of gear in VBScript to ensure compatibility with a feature that no one ended up using.

      The dedication towards localizability was a very real value of the VB team. The localization PM had a sign on his office door which said “ENGLISH IS JUST ANOTHER LANGUAGE” to remind people. VB was the first Microsoft product that shipped a single binary that worked everywhere in the world, which was IIRC VB 3. Prior to that, every localized Microsoft product had one binary for Europe and North America, one binary for Japan, one binary for China, one binary for Korea, and so on.

      I was part of that effort as an intern — one of my many jobs was running the nightly tests in Korean, a language I do not speak or read in any way, so that was fun. And then as a full timer one of my first tasks was to do a bug-removal pass over the string handling library to ensure that the same code would work on every localized version of 16 bit Windows whether DBCS or SBCS, and Unicode. I fixed a lot of bugs.

      • Ah, Excel’s localization… where function names such as “COUNT” and “IF” were translated to “СЧЁТ” and “ЕСЛИ” in Russian while leaving the cell names like “G2” intact. Now you had to keep endlessly switching the keyboard layouts from Cyrillic to Latin to Cyrillic to Latin to type any moderately long formula — untill one just gave up and point-and-clicked the cells with the mouse instead *sigh*

      • I did not know about that plan to localize object methods, that’s an interesting idea! That, however, *would* have broken cross-(human-)language portability in a big way (since both VBScript and late-bound member access must be stored as strings), so I’m really glad it never took of.

        (BTW, what “version” of the VBA keywords did you disallow as variable names? The English version? The version of the current locale? All possible translations?)

        I do remember that Microsoft was a shining example for well-done localization: Even documentation was (apparently) translated by people who were not only very proficient in German, but who also understood the product and used terminology consistently.

      • > ENGLISH IS JUST ANOTHER LANGUAGE

        Reminded me of Michael Kaplan’s (rest in peace) Sorting It All Out blog. It was always surprising to realize how much effort went into all the various “izations” and what a huge expense/difficulty curve upwards there was from a small US software shop to having a huge international audience.

        Michael was one of the “old guard” MSDN bloggers I’d check every day (including Raymond, Larry Osterman, and you). It’s astonishing to realize that’s now within spitting distance of being 20 year ago and fantastic that many of you guys are still active.

  6. With respect to data layout, COM and VBA are (like Fortran) column-major, as compared to the C-heritage languages (including .NET) which are typically row-major. I presume that’s a carry-over from early Microsoft Basic (or BASIC) implementations. I don’t know if this was an official part of BASIC or if it’s a quirk of Microsoft’s implementation. This might account for transposition from what you’re used to seeing.

    Variable names are one or two characters because only the first two characters are significant. (I forget if B$ is a distinct identifier from B.)

  7. I had this book and it’s sequel when I was a kid. They were among my most favorite books. I ran these programs on my Apple ][e and had hours of fun. I was happy to find the PDFs on archive.org a while back.

  8. Back in the 80’s I used to code lots in Basic on my c64. As C# coder today I’m horrified seeing the 1 letter variable names people used to work with in Basic back then. Why was this? Longer variable names consume more space in ram?

      • Also, imagine that you are implementing the BASIC interpreter for a C64 in 6502 assembler. Variable lookups are entirely by name and a new variable can be introduced at any time in the program. What data structure do you use for a variable lookup? Seriously, give that some thought. How big is that data structure, and how fast are lookups? And what can you do to optimize it? Things get a lot easier to make small and fast if variables names are fixed to be exactly one or two characters.

  9. There is a pretty cool ‘game’ on Steam at the moment called Comet 64. This is what it sounds like – a C64 simulator. Thought I’d mention it as I think you’d like it 🙂

  10. Do you think you could ever go through Tony Finch’s ListLife algorithm? I’ve always been curious about it, but I could never grasp it.

    Either way, thanks for making this series. For the longest time I’ve been trying and failing to wrap my head around life algorithms other than Abrash’s (the original) and the naive one, that this series has been a godsend. Especially since you been through the whole of it, this has been quite instructive and insightful.

    Cheers.

  11. Realized it’s been about a year since your last post, but I still check occasionally to see if there have been any new fabulous adventures.

    Hope everything is well for you and yours, Eric.

Leave a comment