Introducing Cairntaker, a garbage collector

I’ve been using JavaScript for a while now, and I’m finally getting fed up with it in ways that I just can’t deal with. Or can I? Now that I’m fed up with its memory model, I’ve built a garbage collector. In JS.

I call it Cairntaker. It’s for programmers who want well-tended heaps.

(NOTE: This project hasn’t really been tested, and it’s almost certain to have silly bugs. It might even have unacceptable performance issues. Use it at your own risk, and let me know how it goes!)

Weak tables

JavaScript doesn’t have weak tables. Weak tables, which are hashes which hold their values for only as long as their keys are reachable, are useful as a way to attach metadata to values in an existing program. In fact, they’re generally handy whenever you want to be very particular about when some data becomes unreachable: They let you balance a value on top of two other values (the table and the key) so that your value crumbles away as soon as either of its parents do.

There are other kinds of weak references in the same vein as weak tables, but weak tables are sufficient to represent them:

  • An ephemeron, as seen in Racket, is an ordered pair whose second value is held for only as long as the first value is. It’s essentially a weak table with a single entry.
  • A weak box is a single-item container whose value is always held weakly, period. It’s essentially an ephemeron which maps the value to itself.

In fact, strong data structures also boil down to weak tables. A strong ordered pair can be modeled as a weak table whose keys happen to be two specific values that never become reachable.

The Racket docs cite a particular paper on ephemerons: “Ephemerons: A New Finalization Mechanism” (Hayes ’97). I found it very useful for implementing this aspect of the garbage collector, and as inspiration for dealing with the immutable value issues I’m about to explain.

Immutable values

All the immutable values I care about are either atomic symbols or immutable weak tables.

Actually, they’re almost the same thing.

When comparing immutable weak tables for equality (in order to use them as keys themselves!), it’s worth noting that two immutable weak tables that currently hold the same data might also hold distinct unreachable data that has been collected. So we can define atomic symbols as an infinite supply of distinct weak tables whose keys are all unreachable.

In Cairntaker, the difference is that symbols have names. These names are used for storing simple data that would be much too cumbersome to encode as tables, such as strings and numbers.

Anyway, for garbage collection purposes, symbols are always reachable. Even when there are no strong references to a symbol, Cairntaker assumes that we might reconstruct that symbol by recreating its name. When there are no weak references to a symbol either, it’s collected.

Similarly, Cairntaker’s immutable tables are always reachable if all their keys and values are reachable, even if there are no strong references to the table itself. This is essential in order to define a weak table constructor that’s referentially transparent. If we construct a table once, create another table with that one as a key, and then forget the key, we must be able to retrieve it again by running the same constructor code. Oddly enough, I haven’t seen this concern mentioned anywhere… not even in Haskell’s variation of ephemerons, which dodges the issue by requiring IO. Cairntaker may be making new strides in this area.

Incremental

Any garbage collector built on top of JS better not be a stop-the-world collector that grinds the page to a halt and triggers “Stop running this script?” dialogs. ;) Cairntaker is an incremental collector, which runs for as long as you ask it to and then simply returns, giving full control back to you. I expect Cairntaker to be run for short bursts on a frequent interval, interspersed with short bursts of computation that use the heap.

In order to implement a garbage collector with this kind of flexibility, the paper “Real-Time Non-Copying Garbage Collection” (Wilson and Johnstone ’93) was indispensible. In fact, a post on the Arc Forum mentioning that paper was probably what got me started on this project.

It turns out a trampolining idiom came in handy for representing the state of the computation. One computation could “become” another, and I could organize my code in a monadic style, with functions that returned these computations.

Where to take it from here

Vaguely, I expect Cairntaker to come in handy someday in one of my future language designs.

For a while I thought I could whip up a version of Fexpress that used Cairntaker. Fexpress is my stab at a side-effect-free fexpr language with partial evaluation. As it turns out, there are still some aspects of Fexpress’s design that I need to think about a lot more before I continue. I’d like the partial evaluator to project idiomatic Fexpress code neatly into untyped lambda calculus, so that it could take advantage of a large body of existing research, and yet I wasn’t working in that direction.

Just to have something to try Cairntaker with, I’ve written up a little untyped lambda calculus evaluator, in the same file as the garbage collector. The evaluator runs, but it doesn’t actually put any weak structures to the test, and I haven’t tried running the garbage collector itself yet.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s