How to Implement Variable Assignment in a Programming Language

I’m working my way through Crafting Interpreters. The project guides programmers through building their own interpreters for the Lox programming language. I started writing a blog series about my progress. You can see all the posts so far right here.

In the last post covering part of Chapter 7, we talked about error handling, at that time largely in the context of parsing and interpreting expressions.

Now in Chapter 8 we get into some of the fun stuff—state.

I have talked about state a couple of times in the past on this blog—in particular, here, during the series about the book Structure and Interpretation of Computer Programs. If you like this series, you’ll probably like that one 😉.

The short version:

Procedural code without state—for example, a stateless function—amounts to a series of substitution operations in which an argument name serves as a placeholder that gets substituted at runtime with the passed-in inputs. The same inputs always result in the same output, and this makes the function “pure.” (In Crafting Interpreters, Bob Nystrom points out what we’re all thinking: that the programmers who wax poetic about their devotion to dispassionate logic also label said functions with such weighty language as “pure.”)

Add state, though, and this changes. Bob equates the presence of state to the Lox programming language’s “memory.” And just like a human memory, it influences statements’ behavior. The result from print a depends on what we assigned to a .

It also means that the programming language has to recognize a , which we can accomplish by:

  1. Parsing some assignment syntax like var a = "Hallo!"
  2. Interpreting that assignment syntax by assigning a key a in a key-value store to point to the value "Hallo!"

The key-value store doesn’t have to be all that fancy.

In the 80 line Scheme interpreter that we talked about in this previous post, we use a plain Python dictionary for the task. In fact, Python itself used dicts to delineate scope in early versions (here’s the PEP where Python solidified the thrust of their current approach to state scoping, back in version 2.1). In Crafting Interpreters, Bob does similar:

package com.craftinginterpreters.lox; import java.util.HashMap; import java.util.Map; class Environment < private final Mapvalues = new HashMap<>(); void define(String name, Object value) < values.put(name, value); >Object get(Token name) < if (values.containsKey(name.lexeme)) < return values.get(name.lexeme); >throw new RuntimeError(name, "Undefined variable '" + name.lexeme + "'."); > >

The Environment class wraps a plain Java HashMap . Two methods wrap the HashMap ‘s put and get functions with semantically relevant names. That’s it.

This implementation of Environment supports only global scope: there’s one environment, all its variables are accessible everywhere, and either they’re in there or they’re not. Later, when we endeavor to include nested scopes, Environment gets a new attribute—a reference to another Environment , set to nil for the global scope—and when a variable is referenced, the environments check themselves and then, recursively, their enclosing environments, until they reach the global one.

I’ll bring up that SICP post one more time here because we do something similar in that implementation, although we copy the parent environment rather than referencing it:

For nested expressions, our nested environments are recursively passed through, first to seval (which evaluates expressions), then back into a new environment if evaluation reveals another procedure.

…we’re slapping our new local env on top of a copy of the existing environment, and for a huge environment that will be really space inefficient. We’ve talked on this blog before about a similar situation: tracking nested transactions in an implementation of an in-memory database. If this were a real interpreter and not a toy for learning purposes, I might turn to a solution like that one. Note that the nested scopes wouldn’t face that annoying deletion snafu we faced with the database: local scopes can overwrite the values of variables from broader scopes, but they don’t delete them.

So the implementation isn’t super clever.

P.S. If you’re interested in trying out implementing a nested scope yourself, I gotchu! Here’s an interactive activity that I wrote for my Python Programming students to practice implementing database transactions for a pet shop database (complete with fun, dancing parrots!). It’s level-appropriate for someone who is comfortable using dictionaries/hashmaps/associative arrays in at least one modern (last updated after 2012) object-oriented programming language.

To get this running locally, go to your command line and do:

  1. git clone https://github.com/MPCS-51042/materials.git
  2. cd materials
  3. pip install jupyter (there’s also a pipfile if you’re a pipenv user, but I don’t want folks who aren’t regular Pythonistas to have to screw around with virtual environments just to try this exercise)
  4. jupyter notebook (this is going to open a tab in your browser with a list of folders)
  5. Navigate to materials > database_efficiency_exercise > 2_Transactions_(Space_Efficiency).ipynb
  6. The file is a REPL: you can run any of the code cells with SHIFT + ENTER. I recommend starting by going to the top bar and clicking Kernel > Restart and Run All. This will run the code I have provided so you have the Table implementation available in scope, for example.
  7. Try out the activity! I have tried to make the instructions as clear as possible and would welcome feedback on whether you were able to complete the exercise on your own 😊

A simple implementation does not mean an absence of decisions, by the way.

From this chapter of Crafting Interpreters about variable declaration and assignment, one of the things I most appreciated was the time that Bob took to consider the decisions that programming language authors make about how variable referencing and assignment work. Let’s go over some of those decisions:

1. Will the language use lexical or dynamic scope for variables?

That is, will the are over which the variable is defined by visible in the structure of the code and predictable before runtime (lexical scope)? In most modern programming languages, it is. But we can see an example of dynamic scope with methods on classes. Suppose we have two classes that each possess a method with the same signature, and a function takes in either of those two classes as an argument. Which of the twin methods is in scope at runtime depends on which class the argument passed to the function is an instance of. That class’s method is in scope. The other class’s method is not in scope.

Lexical scope allows us to use specific tokens to identify when to create a new child environment. In Lox, we use curly braces for this <> , as do Swift and Java. Ruby also functions with curly braces, although using do..end is more idiomatic. Python determines scope with level of indentation.

2. Will the language allow variable reassignment?

If we want immutable data that has no chance of changing out from under a programmer, then we can insist that the value of a variable remain the same after it is created until it is destroyed. Haskell does its best to force this, and constants in many languages aim to provide this option.

Most programming languages allow for mutable variables, though, that programmers can define once and then reassign later. Many of them distinguish these from constants with a specific keyword. For example, JavaScript provides const for immutable constant declaration and var for mutable variable declaration. Swift does the same with let and var . Kotlin does it with val and var (“var” is a very popular choice for this).

3. Will the language allow referencing undeclared variables?

What happens if a programmer says print a; and no a has been declared? Lox raises an error at compile time. Ruby, by contrast, allows it. In Ruby, puts a where a is not declared will print nothing, but it won’t error out. puts a.inspect will print nil . Bob calls this strategy “too permissive for me” in the chapter. I…tend to agree, but I also don’t relish being all the way at the other end, say, at Java 6, where we had to null check everything in perpetual fear of the dreaded Null Pointer Exception.

I’m a fan of the Optional paradigm that we’ve seen in some more modern programming language implementations like Java 8, Swift, and Kotlin. I want to know if a variable could be null so I can handle it appropriately.

4. Will the language allow implicit variable declaration?

Lox distinguishes between declaring a new variable ( var a = "What's"; ) and assigning it a new value ( a = "Up?" ;). Without that var keyword, it will assume the intent is to reassign, and it will raise an error at compile time because the variable is not yet declared.

Ruby and Python, by contrast, assume that an assignment to a variable that does not yet exist means declare it. So a = 1 in Ruby sets 1 to a if a is nil, and reassigns a to point to 1 if it was previously pointing to something else.

Convenient, for sure. But in my experience, this does open programmers up to more scope-related insidious bugs because an initialization of a local variable and a reassignment of a global one look the same.

Languages might always differ on some of these questions.

And that’s okay: different languages serve different purposes and different communities. A programming language author can establish and communicate a perspective on a language’s purpose and use cases, then allow those decisions to drive implementation choices.

If you liked this piece, you might also like:

This talk about what counts as a programming language, explained with a raccoon stealing catfood from a cat…somehow

The Raft series, of course! Learn to implement a distributed system from the comfort of your home, complete with gifs and smack talk!

Lessons from Space: Edge Free Programming—about what launchpads have in common with software, and what that means for you and your work!