Design and due diligence of the Cene language

Cene is a language I’ve built over the last couple of years. I’ve talked about Staccato and Tenerezza here, and that code has turned into Cene.

What sets Cene apart: Extensibility support

Cene’s design revolves around the primary idea that future generations will have better ideas for programming languages than we do, so most of what sets it apart is its support for custom languages, which mainly has to do with the design of its macroexpander.

Cene’s macroexpansion phase incrementally writes definitions (of macros, functions, etc.) to monotonic state resources using deterministic concurrency. These state resources are expressive enough that user-defined macros can use them to achieve combinations of open-world and closed-world extensibility, which is what I consider to be Cene’s primary feature.

(This was also the primary feature that I started to implement in Blade back in 2010. So it doesn’t feel like Cene is an especially distinct and cohesive project, just an ongoing accumulation and refinement of my preferences in a language.)

To support the use of those open-world and closed-world extension features, Cene has support for orderless collections, so that programmers can be sure that the order of extensions in those collections doesn’t matter. Cene’s orderless collections are tables, and they’re very much like hash tables. However, languages usually offer a way to iterate one by one over the keys of a hash table, giving the table an accidental order. Cene doesn’t do that. Instead, Cene offers an exhaustive set of combinators to construct commutative and associative operations, and any of those operations can be applied to a table to reduce it to a single value. There’s a similarly exhaustive set of comparison operations, which can be applied to a table to sort it into a list of smaller tables.

Those combinators can actually call back to Cene code in some places, making them very expressive. For this to preserve the mystery of the table ordering, this Cene code must be pure.

In fact, Cene is a thoroughly pure language. It accomplishes all of its I/O in a monadic style, or in other words by generating values that designate what commands are to be performed. When it comes to macroexpansion-time side effects, the command values can themselves be combined by a commutative and associative operation: Concurrency.

Other Cene features

Cene has a follow-heart operation which is still a pure function, but its behavior is fully determined by the language implementation. I intend to use this to manage error handlers and debugging, as I’ve talked about on this blog before. I haven’t explored this concept much in Cene, mainly because I haven’t written any application-specific error handlers yet.

Cene has a syntax made of lists and strings like Lisp, with careful attention paid to the design of the string and quasiquotation syntaxes. This is primarily intended in preparation for a future world where generating Cene code is a useful thing to do. For now, some of the string syntax features come in handy for writing string-based DSLs.

Due diligence

I often say to people that I make programming languages, but there are so many aspects of a serious programming language that I’ve always neglected in my languages, considering them too unstable to be worth the effort. In 2016, I finally went outside my comfort zone and did a lot of this work for Cene:

  • I designed a command line interface for Cene from the top down, so that it could integrate with build automation tools like Gulp and Travis CI. When Cene’s macroexpander installs definitions, some of those definitions are the contents of files to generate, so Cene is right at home in a build process.
  • I gave Cene a semantically thorough JavaScript FFI so that it can build on JavaScript’s platforms and libraries. Although the FFI is thorough, it’s still very abstract, and it doesn’t tie the language down to the idiosyncrasies of the current JavaScript-based Cene implementation.
  • I gave Cene a simple website, the beginnings of an API reference, and an example project. I put all the Cene and Era projects into their own GitHub organization so that their documentation wouldn’t be riddled with the name “rocketnia.” I’ve released Cene on npm where it can be installed as easily as any other major JavaScript library.
  • This year I’ve actually started to use the Cene language myself. I’ve started to write Mise-en.cene, a library/DSL for screenplay-like character dialogues.

This isn’t the kind of work that usually sets a language apart, so much as the work that puts it on the map, but to me it’s pretty new and exciting. These are skills that can translate: I’ve applied some of that same attitude and knowhow to contribute some things to the Arc language as well, making Anarki fit in better among the Racket language ecosystem.

Error messages, or the lack thereof

Unfortunately, there’s one more piece of neglected due diligence that makes it difficult for me to support anyone but myself in using Cene: Error messages. Usually, Cene crashes with a JavaScript stack trace when there’s an error, and I can’t track down the cause of the error without making exploratory modifications to the language implementation or opening up the generated JavaScript code. That’s almost tolerable for me, but only because I know Cene’s implementation inside and out.

Certain parts of Cene’s design are prepared for error messages. For instance, the string syntax makes it easy to wrap an error message on multiple lines of code without giving it weird spacing when it’s shown to the user, and Cene’s follow-heart operation could someday be useful for programmatic processing of errors.

Cene macros even take input that contains source location objects, which I would like to use to generate more informative error messages. For now, they contain no information; they’re just placeholders as I think about the requirements of the problem. When macros would generate their own intermediate syntax, I would try to determine what kind of source location should be associated with it.

Unfortunately, as soon as I started to define a string-based DSL, it became really clear that the source locations I had weren’t granular enough, in the sense that they weren’t telling me exactly the source location for every character in a string literal, down to its escape sequence. Even if they did provide this information, my macros were going to need to do a lot of ad hoc processing to deal with it properly.

So, I’ve been looking for a better way to deal with source locations in the macro system… a technique that treats them as being part of the “material” of the syntax rather than needing to be manually manipulated. Instead of parsing strings, my code would parse source-location-infused strings, for instance.

For programming in Era, I’ve long imagined that I wanted programming to be an activity that enhances communication rather than producing an artifact by deep deliberation. I imagined that Era might empower people to communicate in a way that is somehow enhanced by the abilities of programming, maybe by using a chat interface where the the widgets of the interface were themselves the syntax of a program.

Now these concepts seem to be converging. Just as a syntax can contain source location data implicitly, then perhaps it can also implicitly contain all the stylistic details and state of a UI widget. But then, of course these topics would converge, because the source location of a UI widget is naturally its location on screen (or in the keyboard focus hierarchy), which depends on stylistic details.

So as I think about polishing up error messages for Cene, I’m realizing that I basically have all I need to begin developing Era. Refactoring Cene’s macro system to improve error messages is a daunting task, but what about building a macro system that’s general enough that I can use it for both Cene and Era?

Change of implementation language

When I’ve tried to tell people about Cene, it’s been pretty frustrating. I get started wanting to talk about extensibility or hygiene or the FFI, but then I can’t show off code without having to explain some other features of Cene, especially the syntactic ones, since they’re the most visually distracting.

If I could only release them as individual tools with individual documentation, I could discuss them with people one piece at a time.

Unfortunately, syntactic features can’t very well be libraries in JavaScript, and the rest of my Cene code can hardly be released as idiomatic libraries in JavaScript because I use so much continuation-passing to work around JavaScript’s limited stack size.

But… in Racket… If any language lives the idea “future generations will have better ideas for programming languages than we do” today, I think Racket’s the standout example with its #lang feature. As I look into Racket deeper, I’m even sort of astonished by how Racket’s use of “structure type identifiers” is so similar to my policy for selective encapsulation of structs in Cene.

I believe I will be able to release many of Cene’s syntactic features as Racket reader extensions or macros, and possibly some of the semantic ones as well. Let’s see how far I can get before I stop writing Racket libraries and write a full language implementation again. When I do, writing it in Racket is going to be no more difficult than writing it in JavaScript, I’m sure.

I’ve started with writing the Parendown library, a simple sugar that drastically changes the appearance of my code. With that Racket library in place, my Racket code is looking a lot like my Cene code already.

Next up, I’ve spent the past week prototyping a generalized macro system that can better accommodate source locations and my quasiquotation innovations at the same time. I can try to use this kind of macro system for Cene or Era… but for now, I think I’ll try to use it for Racket. Maybe I can get it to integrate seamlessly with Racket’s own macroexpander.

Cene from here

Cene is far from over, unless I manage to build Era first. :) Even then, Cene’s plain text syntax is important for a lot of integrations with contemporary technology and culture, such as blog posts and version control diffs.

Since changing the name from Staccato to Cene, I’ve always wanted Cene to have the kind of design minimalism that makes it easy to write standards documents and alternate implementations like those of Scheme and SML, which means that I ultimately don’t want it to be tied to any unnecessary details of the host language.

Once Cene has good error behavior, I believe it will stand as a useful language of its own.

Who to thank

It was largely thanks to my past few years of employment that I felt the pressure to practice more of what goes into a software project, from the documentation to the unit tests to the build process to the databases. I believe it was because I read about the Twelve-Factor App at work that I decided to take my abstract design thoughts and turn them into a language that had good integrations with the command line and JavaScript, which motivated rechristening the project to Cene (named after the suffix “-cene,” designating relevance to a particular span of time).

The design of Cene’s monotonic state resources was almost something I almost got right in Blade in 2010, but over the past couple of years I’ve learned a lot about monotonicity and deterministic concurrency thanks to reading about Lindsey Kuper’s LVars and Michael Arntzenius’s Datafun. I had also been inspired in late 2015 by Arnold Beckmann’s slides for “Feasible Computation on General Sets” to regard orderless collections as a serious possibility for my languages.

I would like to thank all the people I’ve tried to talk with about Cene, including Alexis King and Michael Arntzenius, for motivating me to improve my documentation and break the features apart into easier-to-digest pieces.:) Those two have also inspired me lately with what they can accomplish by writing their languages in Racket, which I’m sure has affected my newest direction here.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s