Sharing code is like sharing knowledge—the knowledge of how to write that code. I think we can simplify the meaning of program imports by making them queries to an oracle of all possible shared code.
This frees us from some complexity in maintaining the modules installed in a system: The modules’ effects on what the language can accept is declarative—as in they’re commutative, associative, and idempotent, like a mathematical set—and their effect on what the language’s programs mean is zilch. If the module oracle fails to find a match in its search, that just means there isn’t enough knowledge installed, not that the application itself is defective or should behave in a different, unintended way (like choking at run time).
When I talk about searching “all possible shared code,” I do mean searching every revision of every library, including every possible future revision, where the names may have been replaced with every possible combination of other names. This means we can’t identify modules by name, the way most module systems do.
I call it Cairntaker. It’s for programmers who want well-tended heaps.
Recently I find it compelling to view programs in terms of their dataflow, since sometimes the whole dataflow graph can be generalized to some other purpose. I think it could be interesting to have a lisp variant whose surface syntax was not a tree but instead a dataflow graph. Last post, I described this train of thought in more detail. Now I’ll go into detail about my own partial ideas for generalized dataflow syntax.
The thing about syntax is, all the familiar syntaxes we use are flat enough to fit as text, and we group them by locality into hierarchies. That’s not something I’m trying to revolutionize right now; instead I’m looking for a way to construct these dataflow graphs using a surface syntax similar to what we already use.
Reactive programming and dependent typing have opened my eyes to an interesting way to view syntax. Now if you’ll bear with me, I won’t get to syntax until the end of the post.
In programming languages we often build “data” out of sums (either this kind of thing or that kind of thing) and products (this kind of thing and that kind of thing together), and every serializable notion of data, like binary, Unicode text, JSON, etc. falls into that pattern pretty easily. But right down at the lowest level, when we encode a binary bit (or boolean value) as “either 0 or 1″… when the “kind of thing” is a 0, what does it take to store it? Nothing; if we know we’re trying to read something of kind 0, we already know the value without even looking at the data. The same goes for 1. It turns out a bit is just “either this or that” when we only care about the branch itself, not what this or that actually represent. Our algorithms could actually be operating on secretly richer data structures, and they just happen to ignore that substructure.
In most languages, we can hold state containers and external references in our data structures, even though they don’t have the same serializable nature as sums or products. Many utilities that operate on data structures (say, a utility that reverses a list) will simply leave these exotic values alone. The secretly rich substructure actually exists, and it’s crucial to the other parts of the program.
A more common word to use to discuss this secrecy is polymorphism. If a utility really doesn’t care about what type it’s dealing with, it’s polymorphic in that type. Often things are polymorphic only up to some condition on the types involved, and conditions on individual types tend to be represented as types of their own. Implications between conditions are subtyping relations: A is a subtype of B iff A’s condition is sufficient to ensure B’s condition.
Reactive programming turns programming on its head in the following way: Suppose sums represent decisions that change over time, rather than eternally fixed, serializable bits. It turns out most of our programming patterns are actually polymorphic enough to continue making sense under this interpretation. Naively, we can just run our programs over and over every time a decision changes.
Dependent typing turns programming on its head in a similar way: Suppose every value is associated with some other value (its type) that represents what we know about that value at compile time. Unlike in many kinds of static typing, dependent typing breaks down boundaries that typically distinguish types from values, and a polymorphic utility can potentially work on types as easily as it works on values, so that some of our existing programming patterns can essentially be used for extending the compiler.
That’s one ideal of dependent typing anyway; in dependent type systems, there’s a tradeoff between ease-of-use and the mathematical rigor of “what we know about that value.” Mathematical rigor isn’t really something we can have halfway (though there are multiple ways we can appraise it, e.g. multiple foundational theories our own theory can be relatively consistent to or expressive of), so lots of research effort goes into pushing the boundaries of usability while preserving desirable proof-theoretical properties. For instance, if we naively allow arbitrary (potentially nonterminating) computation or allow full polymorphism on some type of all types (including this one), we open the door to mathematical inconsistency, and our compile-time knowledge is no longer trustworthy.
In effect, dependent typing introduces some new exotic values to our data structures (the exotic values being types), and certain kinds of dependent typing classify nonterminating functions as being too exotic to fit.
Back to syntax
I opened with a mention of syntax, and then I’ve blitzed through data, polymorphism, and these alternate kinds of dataflow on my way to the point.
Lisp gets pretty good mileage out of its syntax. When it handles program fragments as hierarchical lists of symbols, they’re about as intuitive to manipulate as any other list-structured data.
The thing is, encoding a program as a hierarchical structure in the first place is finicky business, with variable scoping issues to worry about. Now that I’ve seen that there are meaningful ways to reinterpret whole programs, I wonder how much more mileage a lisp variant would get if its surface syntax took care of scoping issues itself and left us with just a dataflow graph to manipulate.
Unfortunately, I think this surface syntax would inevitably be more verbose. If we assume little about the programming model at hand, we don’t get to fine-tune the syntax to work in the best way for that model. Additional syntactic abstraction before this step, in the form of a hierarchical macro system or a context-free parser, may make it more palatable.
During this time, the potential value of static type systems has become quite clear to me. For the most part I still really don’t care about protecting programmers from their own mistakes. Instead what I value is the ability to make promises about program behavior that untrusting parties can verify on their own (for non-hackish metaprogramming and secure code distribution), the ability to write expressive abstractions that don’t sacrifice any run-time performance, and the ability to infer a program’s implementation from its interface. I still like the dynamic programming ideal of putting lots of power in the programmer’s hands… but whence else comes this power? Rather than stubbornly reinventing the existing static type research under some other name, I embrace it.
I haven’t felt the pain of the asynchronous callback nesting problem firsthand, but I know a horror close to that one: Monadic programming in Groovy. Observe the indentation.
In fact, that indentation-heavy part of the Blade code is what originally got me fed up with using Groovy and interested in making Penknife as an intermediate project on the way to Blade. Well, my work on Penknife has been so focused on modularity and hygiene that it’s slack on what really matters: Syntax. Oh, syntax, how you really really matter so! :-p
There’s one design goal of Arc I find completely self-evident. Arc’s designed for good programmers. Duh. Programmers in the future will be better than us, because they’ll be capable of observing our mistakes. If a language doesn’t target good programmers, it doesn’t have a very bright future.
There’s way more to this concept than than Arc tackles, though.
Programmers in the future will learn from the mistakes we make designing languages today, and they’ll design better ones. For there to be a single language that persists despite future innovations, snowballing its own library support along the way, it ironically needs to be all the various languages people want it to be. It needs to be a customizable language. (And it needs to target platforms other than its runtime.)
Any two of those languages will easily target each other’s runtimes, so if things happen the way I expect, we’re likely to find a landscape full of competing languages that all abstract over each other with no clear winner.
To limit that effect, it’s important for those languages to try to have no obvious defects from the outset, so that they don’t build up audiences that feel they need to spin off into various camps to pursue slightly better languages.
For this reason, I think it’s a mistake for any language that takes itself half-seriously to impose an arbitrary limitation on its users. A limitation that makes other things easier is fine, but the point of a tool is always to make people more productive, not less. Good programmers will police themselves as necessary.
But for the same reason, I think the strongest position in a winnerless landscape of customizable languages is a minimalistic one. If a customizable language’s runtime has only the features it needs for convenient syntax customization, the designer has fewer opportunities to mess up and introduce flaws. In that sense, the language should be severely limited, but for a non-arbitrary cause.
Meanwhile, each of these languages will itself host a lot of competing syntaxes and abstraction layers. Not all of those will follow the language’s own philosophy, ’cause they’ll be their own things. But many will be general-purpose tools for people to use when customizing the language’s syntax, and their presence will effectively recolor the language for people who want to take advantage of their added features. I believe the language should have a strong philosophy encouraging consistency between these libraries and the main language runtime API, so that the experience doesn’t fray into separate and competing sub-experiences.
But a language’s library landscape is anything but minimalistic, so it can’t share that aspect with the core. Libraries will inevitably introduce extra concepts with flawed designs and buggy implementations. Instead of trying to avoid this, library programmers should assume the programmers who use their code know better than they do, by nature of those programmers being in the future. Thus, when in doubt, they should leave their implementation details exposed rather than hiding them. They should maximize their users’ freedom, just like the language itself does.
A naive take on this would be challenging for libraries in active development. They tend to change their implementation details from release to release in incompatible ways, so if they feel pressured not to break others’ code, that limits their freedom! To solve this, I believe library users should be capable of using unstable features, but that the language culture should encourage library writers to make it clear what’s stable and what isn’t, perhaps in such a way that even if it’s easy for library users to use the unstable parts, it’s also easy for them to refrain from doing that.
So altogether, I believe the best kind of language is a minimalistic, customizable-syntax, cross-targeting build language which has a philosophy encouraging consistently unstable library APIs, makes nothing intentionally hard, and most of all prepares for future innovation.
Not only is this my kind of language, I think it’s imperative to make these languages extremely well from the beginning so that it we don’t let suboptimal versions gain the advantage of entrenchment.
Platforms are not languages to me. Languages don’t deal with security, file operations, threads, and so on. Languages are syntaxes people use to develop tools on platforms.
Platforms stack, vaguely. The LAMP stack is a classic example: The server is Linux running Apache, content-specific sub-platforms are developed on top of PHP or Perl for Apache to use, data sub-platforms are developed on top of MySQL for the PHP/Perl code to use, and the machine code programs (the OS included) rely on a hardware platform of some sort or another.
Platforms also target other platforms and hide them away. For instance, a PHP coder hardly cares about the instruction set in use on the machine, ’cause the interpreter takes care of that. It’s easy to write a simple PHP script that doesn’t really care what the hardware, the OS, the server, etc. look like. In this case, PHP itself may be all we’re thinking about. In that case I’d say that PHP is our platform.
The reality today is that platforms and languages are developed symbiotically. Innovative general-purpose platform concepts spawn new languages suited for them, like Erlang. Languages which intend to support multiple platforms just end up defining their own. Most platforms can’t be perfectly translated into each other, so ports of platform-defining languages can only be sparse or imperfect, and when it comes down to it, one’s choice of platform determines one’s language options.
Many programmers, including myself, prefer to choose languages not based on the platforms they target but based on their smoothly adapting syntax as project-specific needs come up. Languages which support extensible syntax, like most lisps, are ideal for this purpose, since they hold an implicit infinity promise that they’ll eventually be every bit as beautiful and convenient as every other language, if people only bother to develop the necessary syntax extensions.
But customizable languages fall short today. People are used to developing programs that target just one platform—the runtime of the language they’re writing in—and that makes customizable languages only as flexible as the one runtime they’re limited to.
For a language to have an inflexible runtime is important for code reuse, but if we’re going to have a convenience-and-aesthetics language that’s capable of continuing to build momentum over the next, say, 100 years (Arc’s shtick), we need a language that will target 100 years of platforms we haven’t invented yet. Therefore, the platform the language code runs in and the platform it targets (usually) won’t be the same.
Instead, the language needs to encourage a development process that’s like writing a compiler or build script, where the code produces project artifacts like binaries and other-language scripts, and those are all that’s deployed.
If we start with a “compiler” that just outputs what it’s told to output, then writing code generators in the language will already be at least as convenient as writing the code manually, so the infinity promise will already fulfilled, albeit unimpressively. Templating languages are a step beyond this even, but flat string concatenation is an ineffective way to go forward. Instead, I believe the customizable language’s core focuses should be on AST manipulation, parsing, dependency resolution, and other compiler duties.
This is the direction I’ve been taking Blade and Penknife, my own programming languages. Platform innovation is well and good, but I believe truly general-purpose languages are nothing if not homes for syntax innovation.
$ cd /path/to/jarc $ java -server -cp \ > "jarc.jar;$GROOVY_HOME/embeddable/groovy-all-1.7.2.jar" jarc.Jarc Jarc> (= lathe-dir* "path/to/lathe/arc/") "path/to/lathe/arc/" Jarc> (load:+ lathe-dir* "loadfirst.arc") nil Jarc> (use-rels-as jv (+ lathe-dir* "imp/jvm.arc")) #3(tagged mac #<procedure>) Jarc> (jv.jvm!groovy-util-Eval-me "2 + 2") 4
It’s on now.
After about a month of me submitting bugs to the Arc forum and Jarc’s creator, JD Brennan, fixing them, Jarc is now compatible enough with Arc 3.1 that my Lathe (the library I introduced last time) now wholeheartedly supports the Jarc implementation of Arc. This means it works on four Arc setups: Jarc, official Arc 3.1 (which itself needs a Windows fix), Anarki (which needs the same fix), and Rainbow. Most of you reading this are probably from the Arc forum, in which case you knew all that already. :-p
Those are all the setups I want Lathe to support for now, but I might consider arc3f, arc2c, or another Arc implementation if I realize it’s actually active. Another complication is that quite a lot of Arc users modify one of the PLT Scheme versions of Arc (Arc or Anarki) to suit their needs. In those cases, I figure the burden of Lathe compatibility is on them if they need it.
Still, there is a bit of a need to add new features to Arc. To this end, Anarki, Jarc, and Rainbow have provided ways to call from Arc into the host platform (PLT Scheme, the JVM, and the JVM respectively).
I’ve been doing things here or there in Arc without really having a specific plan in mind for sharing them. Most notably, I had the fundamentals of an untested Arc multimethod system laying around, which were made in an effort to collect my thoughts about Blade, a potential idea for an everything-is-a-multimethod language. And that’s what I expect a large part of my projects to be: Approximations of a language I’d rather program in. So finally, last week I started up a GitHub repo, Lathe, where I could submit miscellaneous utilities that smoothed out non-Blade languages according to my own aesthetic.