Lately I’ve been trying to iron out the details of a generalization of quasiquotation which I call “higher quasiquotation.” The basic idea is that just as a quasiquotation is a region in one parenthesis-delimited region (marked by
quasiquote) and a set of other parenthesis-delimited regions (marked by
unquote), we can go on to talk about regions between quasiquoted regions, regions between those regions, and so on.
If you think of values with holes as being functions, then the notion that this is a “higher-order” quasiquotation is clear: Each quasiquotation determines a value of type
(c SExpr -> SExpr), the next higher degree of quasiquotation determines a value of type
(c (c SExpr -> SExpr) -> (c SExpr -> SExpr)), and so on, where
c is some collection like
c a = Map String a. But these functions aren’t the whole story; the quasiquotations should be able to be pulled apart like other data structures, not just filled in to create s-expressions.
I haven’t managed to write a full macroexpander for higher quasiquotation yet. I’ve written this post to share my status as it is.
Cene is a language I’ve built over the last couple of years. I’ve talked about Staccato and Tenerezza here, and that code has turned into Cene.
What sets Cene apart: Extensibility support
Cene’s design revolves around the primary idea that future generations will have better ideas for programming languages than we do, so most of what sets it apart is its support for custom languages, which mainly has to do with the design of its macroexpander.
Cene’s macroexpansion phase incrementally writes definitions (of macros, functions, etc.) to monotonic state resources using deterministic concurrency. These state resources are expressive enough that user-defined macros can use them to achieve combinations of open-world and closed-world extensibility, which is what I consider to be Cene’s primary feature.
This is a more personal entry than usual. This year, several important things have happened in my life already.
Tenerezza is distinguished from mainstream languages in a few ways:
Tenerezza is an untyped language where all values are distinguished by sealer/unsealer tags. Even function closures are unencapsulated to someone who has the right unsealing permission.
Tenerezza’s first-class values are always sets. In fact, you can only unseal an element at the time you loop over its container because you can’t manipulate a single element directly. This builds on some existing work around doing computation on sets.
Tenerezza supports general recursion, but only by way of a computation monad (namely, returning a stack of functions to call). Otherwise if a Tenerezza computation’s set inputs are a constant size, it takes constant time. This is meant to make it easy to visualize, step through, patch, and persist the dynamic shapes of a Tenerezza computation.
Actually, Tenerezza’s first-class values aren’t merely sets. Every first-class value in Tenerezza carries an input set, but it also receives an output set. Overall, it represents a communication channel.
This actually means Tenerezza programs may have causality paradoxes as inputs loop around to become outputs and back, but these infinite loops should be about as easy to debug as the infinite loops today’s programmers are used to.
The macro layer
Although the Tenerezza language does some desugaring so programs can be written in a convenient format, the Tenerezza language is going to be pretty verbose to use directly. Possibly the biggest issue is that a Tenerezza program needs to have names denoting every one of its cheap steps, and these special-purpose, implementation-dependent names clutter the code.
I’ve designed a macro layer. Macros can locally convert custom surface syntaxes into the standard Tenerezza sugar, and Tenerezza can then do nonlocal desugaring transformations to decompose the program into cheap steps.
I’ve put together the macro layer in an extensible way, using some late binding techniques I learned making Penknife. Unlike in Penknife, this time I’m not reliant on global method tables; I’m now using an idiom where the method is a data structure, and an object is a function that takes a method as an argument. This is a pretty simple idiom, and I might revise it in various quirky OO ways to support things like
super and meta-object protocols, but I’m happy that these quirks will not pollute the global definition semantics.
Taking it from here
This is shaping up to be a small design that covers most of the bases of what I’d want for general-purpose programming. Once some further details of the design are fleshed out, such as macro hygiene and code signing, I may start implementing this and using it.
It’ll probably be slow at first since the implementation will be passing around sets, but thanks to Tenerezza programs’ decomposition into cheap steps, it should be pretty easy to profile a program and figure out which specific subprograms could benefit from optimization.
It’s been a couple of years since I posted here! Somehow 2014 turned into an opportunity for me to practice forms of expression other than programming, but that’s not what I’m here to talk about. I still spent lots of time on language design throughout 2014, and I’ve been zooming along in the new year.
Ethics for language design
I still dream big enough in my language design that ethics is an important consideration, but I’ve mellowed down when it comes to long-term ethics, and I’ve riled up a little about short-term ethics. :) Whatever happens, I think we’ll eventually build systems for better communication throughput between people, reducing the kind of violent pressure releases I was afraid of. After all, whatever organizations have better communication methods will probably become more intelligent as organizations, and they’ll outcompete the others. We just have to worry about how brutal that competition will be. So I figure we should foster egalitarianism in ways that cultivate competitive markets. Does that make me a left-libertarian? I don’t even know.
Pragmatics for language design
At this point I think of a programming language as something that has niche value. “Programming” is a rather nebulous term itself, and “language” describes the skill you use rather than the reward you get. As user interfaces go, programming languages are optimized for tasks where the user will have a) a long time to prepare their input, b) a higher tolerance for complexity than for redundancy, and c) a rather strong commitment to their chosen code once it’s deployed out of their reach. Well-designed UIs avoid such high complexity and commitment burdens, so my expectation as a programming language designer is to put myself out of a hobby.
Between complexity and commitment, commitment is the more essential problem. Much of the complexity in programming can be traced back to commitment thresholds: Either the bit is set, or it’s clear, never in between. Furthermore I think it’s plausible to trace this back to the mind-body threshold: Human minds are so disconnected from each other that we insist on personal identity, and this insistence lends a sense of absolute discreteness to so many of the concepts we form. (Although I’m attributing this to humans, this might be more specifically a Western trend. My perspective is too myopic to tell.)
On the other hand, not all complex artifacts are deployed with a high commitment cost. Sometimes people deploy complex artifacts because they can’t help it, leaving behind fingerprints and memories. We might want to take advantage of this, focusing on programming languages that help us interpret found artifacts in a useful way.
If I’m on the right track here, then programs would do best to be shaped like some kind of fingerprint, and encapsulation boundaries in programming would do best to behave like the encapsulation boundaries of people. That way we’re not introducing unnecessary concepts. Well, people are vaguely like modules: Not only is a program module encapsulated in a way vaguely similar to a person, but orderless sets of interacting modules are vaguely similar to orderless sets of interacting people. When modules interact, their interaction membrane, if we took a fingerprint of it, would be their import/export type signature. Maybe a type signature is a fingerprint for person-to-person interaction too.
Based on this train of thought, we might like to find a language with only type signatures. If we took a typed functional language with implicits or type classes, removed everything but the type signatures, and tried to program with it anyway, we would accomplish some form of logic programming. Maybe logic programming languages are on to something.
Below the cut, I’ll list some of the language projects I’ve worked on over the past year or two. The above philosophical premises will be relevant for a few of them, but I won’t refrain from discussing tangential features and challenges I’m excited about. If you’d like to avoid most of the technical meat-fluff and get back to the philosophical fluff-meat, I recommend skipping to the section titled “Era Tenerezza” and reading from there. :)
Several years ago, I was excited about pouring lots of features into a programming language design, so I asked various people what they would look for if they wanted to use a programming language, even if they had never programmed before. The most common answer was “good error messages.”
I’m rarely frustrated by errors, so I haven’t had much of a basis to think about how they could be handled or messaged better. But recently, thanks to two online discussions (a discussion between David Barbour and me about API usability, and an LtU thread about static-vs-dynamic language lifecycles), I’ve reached some very specific conclusions.
I’m going to make a distinction here between an error mechanism and a design hole. I consider design holes to be the real errors in a program, and the error mechanism is just something that typically happens when a program falls into a design hole.
I’m gradually figuring out a foundation for a general-purpose programming language, and I think I just laid a great cornerstone, which seems to solve the expression problem for dependent type theory. My pseudocode is in this long GitHub Gist, which comes with a long revision history showing my progress over the last 12 days.
I haven’t yet looked for a proof of strong normalization, consistency, and whatnot (and I don’t even intend for my theory to be expressive enough to support induction!), but the final insight has turned out to be very straightforward: If we can extend an extensible type and reimplement its interface (and re-prove its invariant) so that the new implementation/proof is observationally equal to the original as far as the original cases are concerned, then our extension may as well have been part of the type all along.
I’m using the observational equality infrastructure described by Altenkirch and McBride in “Observational Equality, Now!” and the way I think of the expression problem pretty much lines up with the requirements listed in Zenger and Odersky’s “Independently Extensible Solutions to the Expression Problem.”
On December 20, just in time for the Mayan apocalypse, I thought of an approach to computer programming that unites my meaning-preserving modularity, some of David Barbour‘s RDP vision, and my own philosophical worldview.
I’m calling it Reactive Knowledge Networking. It takes the philosophical idea that a person does nothing with the world except observation and action, and it uses that idea to facilitate people’s communication with each other, with minimal (if any) computer configuration bureaucracy along the way. Its network structure is very similar to RDP, and it uses meaning-preserving modularity to encode the partial knowledge a person has observed.
Sharing code is like sharing knowledge—the knowledge of how to write that code. I think we can simplify the meaning of program imports by making them queries to an oracle of all possible shared code.
This frees us from some complexity in maintaining the modules installed in a system: The modules’ effects on what the language can accept is declarative—as in they’re commutative, associative, and idempotent, like a mathematical set—and their effect on what the language’s programs mean is zilch. If the module oracle fails to find a match in its search, that just means there isn’t enough knowledge installed, not that the application itself is defective or should behave in a different, unintended way (like choking at run time).
When I talk about searching “all possible shared code,” I do mean searching every revision of every library, including every possible future revision, where the names may have been replaced with every possible combination of other names. This means we can’t identify modules by name, the way most module systems do.
I call it Cairntaker. It’s for programmers who want well-tended heaps.