Recently I find it compelling to view programs in terms of their dataflow, since sometimes the whole dataflow graph can be generalized to some other purpose. I think it could be interesting to have a lisp variant whose surface syntax was not a tree but instead a dataflow graph. Last post, I described this train of thought in more detail. Now I’ll go into detail about my own partial ideas for generalized dataflow syntax.
The thing about syntax is, all the familiar syntaxes we use are flat enough to fit as text, and we group them by locality into hierarchies. That’s not something I’m trying to revolutionize right now; instead I’m looking for a way to construct these dataflow graphs using a surface syntax similar to what we already use.
The hierarchical representation
Here’s a rough sketch of an AST for this kind of syntax, as a Haskell type:
data Expr = LetScope String Expr -- LetScope scopeName body | Decl String String Expr -- Decl scopeName varName aliasOp | Var String -- Var varName | Call Expr (Bag Expr) (Bag Expr) -- Call operation inArgs outArgs | Aside Expr Expr -- Aside sideCalculation body | OneOf Expr -- OneOf collection | Op (Bag Expr) (Bag Expr) -- Op inArgs outArgs | Collection Expr -- Collection blueprint | Kw String Expr -- Kw keywordName value
LetScope, we’re able to decouple the syntax for introducing a scope from the syntax for introducing a variable within that scope (
Var is equivalent to
Decl, but it only uses an existing variable rather than declaring one.
aliasOp parameter of
Decl determines how the values going into that variable should transform into the values coming out of it. In most programming languages, a variable is just a name given to a single value, and that value is determined in one place and used in multiple places. In linear-typed programming, however, variables are often (or exclusively) single-use, with their values representing things that can’t be so easily duplicated or thrown away.
In order to make sense of both linear varibles and multi-use variables, we use the
OneOf syntax, which represents one value of a “collection,” where collections correspond roughly to the exponentials of linear typing. (In fact, this whole thing is a rough sketch, so in a working model it might actually correspond pretty well.) In order to let multiple kinds of variables be used in the same program, we specify a custom operation,
aliasOp, that maps the variable’s input uses to its output uses.
In fact, every expression can be used in an “input use” or “output use,” or even a “neutral use” (as the
sideCalculation expression of an
Aside form). In an input use, the expression receives an input from its parent expression, like a pattern match. In an output use, the expression provides output to its parent expression, as in basic arithmetic expressions. In a neutral use, it provides neither input nor output; it communicates only through its own subexpressions (which nevertheless may still use variables to access the outside context).
Call form lets us manually call an operation with certain inputs and outputs (including the input or output associated with the
Call form’s parent expression). Note that these inputs and outputs are unordered bundles (multisets or bags), just as they would be if the operation were used to alias a variable.
As you might expect by now, the
Op form outputs a value that can be used as an operation in
Call forms. It defines this value in terms of a bag of inputs and a bag of outputs, which correspond to the parts of a
Call form. In fact, the
Op form defines only a single-use operation, and as such, the variables it captures are only augmented with a single input or output per capture.
In order to match up the expressions at the operation usage site with the expressions at the operation definition site, we may need to annotate them with additional disambiguating information. This is what
Kw is for: It gives us a kind of nominal typing that amounts to a keyword argument system.
We represent multi-use operations as collections of single-use operations. To define them (and other collections), we use the
Collection syntactic form. Inside the
blueprint of a collection, the scope is completely reset; it has no variables or scopes accessible to it. As an exception, if a
OneOf form appears inside a
Collection form and uses only variables and scopes which are not in scope inside that collection form but are in scope just outside it, that
OneOf form is allowed. Effectively, we can only define a collection in terms of other collections.
The graph representation
The point of this syntactic system in the first place is so that we can write a program hierarchically using familiar idioms but then immediately have it in a more graph-like form for manipulation. Here’s how that graph might look:
data FlowVar = Param String -- Param varName | Local String -- Local mangledVarName data OpCollectionParam = Capture FlowVar -- a collection in the surrounding scope | Input -- an input of an element of this op collection | Output -- an output of an element of this op collection data FlowNode = NodeCall FlowVar (Bag FlowVar) (Bag FlowVar) -- NodeCall op ins outs | NodeOneOf FlowVar FlowVar -- NodeOneOf collection out | NodeKw String FlowVar FlowVar -- NodeKw keyword value out | NodeOpCollection (Map String OpCollectionParam) (Bag FlowNode) FlowVar -- NodeOpCollection subflowParams subflow out | NodeValCollection FlowVar FlowVar -- NodeValCollection constructorOpCollection out
Expr would be compiled into a
(Bag FlowNode), with its free variables becoming
FlowVars. This bag of
FlowNodes describes a kind of static single-assignment form for the dataflow, with the assignee variables appearing in the “
out” position. In fact, each
FlowVar in the bag will only be used once, too; the
FlowVars are the edges of the graph, connecting the
FlowNodes. (These Haskell types don’t enforce these constraints, if you were wondering. Manual validation might be required, as in an untyped implementation.)
Once we have this graph, we should interpret/compile it according to some other semantic model–whatever model we like that makes sense with this kind of graph. It doesn’t need to make sense with all parts of this graph; I expect most models will disallow all top-level params except for a few, corresponding to the built-in operations and primitive values of the model. Many models should probably disallow the use of
NodeKw as anything but a syntactic disambiguator, and strictly linear models might disallow collections altogether (possibly ending up with nothing but
NodeCall!). Of course, whatever a model doesn’t disallow, it should implement.
Example sugar on the hierarchical side
Here’s how someone might define the familiar syntaxes of untyped lambda calculus on top of this syntactic system:
-- We actually don't care about explicit scope; we'll just have each -- variable shadow any other with the same name in the parent scope. -- -- All variables have the same kind of aliasing rules; their aliasing -- operation takes one value and outputs one collection of any number -- of the same value. -- TODO: When implmenting this model, special-case a global variable -- named "alias", holding a collection of these free operations. -- -- Furthermore, we only declare them as parameters of lambda -- expressions, which are in fact our only abstractions and our only -- scope-establishing constructs. -- get :: String -> Expr get name = OneOf (Var name) lambda :: String -> Expr -> Expr lambda param body = Collection (LetScope "s" (Op (unitBag (Decl "s" param (get "alias"))) (unitBag body))) call :: Expr -> Expr -> Expr call fn val = Call fn (unitBag val) emptyBag
For good measure, here’s an extension of that for the calculus of constructions:
-- TODO: When implmenting this model, special-case global variables -- named "dlambda" and "phi", and disallow any OpCollection FlowNode -- not guarded by a call to one of those primitives. binder :: String -> String -> Expr -> Expr -> Expr -- internal use binder binderName param paramType body = Call (get binderName) (listToBag [Kw "given" paramType, Kw "infer" (lambda param body)]) emptyBag dlambda :: String -> Expr -> Expr -> Expr dlambda = binder "dlambda" -- dependent lambda phi :: String -> Expr -> Expr -> Expr phi = binder "phi"
Of course, I’m leaving the hard parts to that “implementing this model” step. I think that’s a good sign; this is supposed to be a mostly syntactic step, with little semantic investment.
This syntactic system doesn’t do much to support sum types, like having a conditional or pattern-matching expression. It’s much like proof nets for multiplicative linear logic (MLL) in this respect, and there’s bound to be some other proof net research I haven’t gotten to that could fill in some missing pieces. On the other hand, I suspect that system might be better off as a secondary layer after this one, since it shouldn’t result in a recursive graph so much as some kind of multiverse of subgraphs of a recursive graph. List processing is convenient, and recursive graph processing may be nicer for broader purposes, but… whatever-that-is processing? Maybe that can wait….