A Dataflow Syntax Sketch

Recently I find it compelling to view programs in terms of their dataflow, since sometimes the whole dataflow graph can be generalized to some other purpose. I think it could be interesting to have a lisp variant whose surface syntax was not a tree but instead a dataflow graph. Last post, I described this train of thought in more detail. Now I’ll go into detail about my own partial ideas for generalized dataflow syntax.

The thing about syntax is, all the familiar syntaxes we use are flat enough to fit as text, and we group them by locality into hierarchies. That’s not something I’m trying to revolutionize right now; instead I’m looking for a way to construct these dataflow graphs using a surface syntax similar to what we already use.

The hierarchical representation

Here’s a rough sketch of an AST for this kind of syntax, as a Haskell type:

data Expr =
    LetScope String Expr             -- LetScope scopeName body
  | Decl String String Expr          -- Decl scopeName varName aliasOp
  | Var String                       -- Var varName
  | Call Expr (Bag Expr) (Bag Expr)  -- Call operation inArgs outArgs
  | Aside Expr Expr                  -- Aside sideCalculation body
  | OneOf Expr                       -- OneOf collection
  | Op (Bag Expr) (Bag Expr)         -- Op inArgs outArgs
  | Collection Expr                  -- Collection blueprint
  | Kw String Expr                   -- Kw keywordName value

With LetScope, we’re able to decouple the syntax for introducing a scope from the syntax for introducing a variable within that scope (Decl). Var is equivalent to Decl, but it only uses an existing variable rather than declaring one.

The aliasOp parameter of Decl determines how the values going into that variable should transform into the values coming out of it. In most programming languages, a variable is just a name given to a single value, and that value is determined in one place and used in multiple places. In linear-typed programming, however, variables are often (or exclusively) single-use, with their values representing things that can’t be so easily duplicated or thrown away.

In order to make sense of both linear varibles and multi-use variables, we use the OneOf syntax, which represents one value of a “collection,” where collections correspond roughly to the exponentials of linear typing. (In fact, this whole thing is a rough sketch, so in a working model it might actually correspond pretty well.) In order to let multiple kinds of variables be used in the same program, we specify a custom operation, aliasOp, that maps the variable’s input uses to its output uses.

In fact, every expression can be used in an “input use” or “output use,” or even a “neutral use” (as the sideCalculation expression of an Aside form). In an input use, the expression receives an input from its parent expression, like a pattern match. In an output use, the expression provides output to its parent expression, as in basic arithmetic expressions. In a neutral use, it provides neither input nor output; it communicates only through its own subexpressions (which nevertheless may still use variables to access the outside context).

The Call form lets us manually call an operation with certain inputs and outputs (including the input or output associated with the Call form’s parent expression). Note that these inputs and outputs are unordered bundles (multisets or bags), just as they would be if the operation were used to alias a variable.

As you might expect by now, the Op form outputs a value that can be used as an operation in Decl and Call forms. It defines this value in terms of a bag of inputs and a bag of outputs, which correspond to the parts of a Call form. In fact, the Op form defines only a single-use operation, and as such, the variables it captures are only augmented with a single input or output per capture.

In order to match up the expressions at the operation usage site with the expressions at the operation definition site, we may need to annotate them with additional disambiguating information. This is what Kw is for: It gives us a kind of nominal typing that amounts to a keyword argument system.

We represent multi-use operations as collections of single-use operations. To define them (and other collections), we use the Collection syntactic form. Inside the blueprint of a collection, the scope is completely reset; it has no variables or scopes accessible to it. As an exception, if a OneOf form appears inside a Collection form and uses only variables and scopes which are not in scope inside that collection form but are in scope just outside it, that OneOf form is allowed. Effectively, we can only define a collection in terms of other collections.

The graph representation

The point of this syntactic system in the first place is so that we can write a program hierarchically using familiar idioms but then immediately have it in a more graph-like form for manipulation. Here’s how that graph might look:

data FlowVar =
    Param String                     -- Param varName
  | Local String                     -- Local mangledVarName

data OpCollectionParam =
    Capture FlowVar  -- a collection in the surrounding scope
  | Input            -- an input of an element of this op collection
  | Output           -- an output of an element of this op collection

data FlowNode =
    NodeCall FlowVar (Bag FlowVar) (Bag FlowVar)
      -- NodeCall op ins outs
  | NodeOneOf FlowVar FlowVar      -- NodeOneOf collection out
  | NodeKw String FlowVar FlowVar  -- NodeKw keyword value out
  | NodeOpCollection
      (Map String OpCollectionParam) (Bag FlowNode) FlowVar
      -- NodeOpCollection subflowParams subflow out
  | NodeValCollection FlowVar FlowVar
      -- NodeValCollection constructorOpCollection out

A top-level Expr would be compiled into a (Bag FlowNode), with its free variables becoming Param FlowVars. This bag of FlowNodes describes a kind of static single-assignment form for the dataflow, with the assignee variables appearing in the “out” position. In fact, each FlowVar in the bag will only be used once, too; the FlowVars are the edges of the graph, connecting the FlowNodes. (These Haskell types don’t enforce these constraints, if you were wondering. Manual validation might be required, as in an untyped implementation.)

Once we have this graph, we should interpret/compile it according to some other semantic model–whatever model we like that makes sense with this kind of graph. It doesn’t need to make sense with all parts of this graph; I expect most models will disallow all top-level params except for a few, corresponding to the built-in operations and primitive values of the model. Many models should probably disallow the use of NodeKw as anything but a syntactic disambiguator, and strictly linear models might disallow collections altogether (possibly ending up with nothing but NodeCall!). Of course, whatever a model doesn’t disallow, it should implement.

Example sugar on the hierarchical side

Here’s how someone might define the familiar syntaxes of untyped lambda calculus on top of this syntactic system:

-- We actually don't care about explicit scope; we'll just have each
-- variable shadow any other with the same name in the parent scope.
-- All variables have the same kind of aliasing rules; their aliasing
-- operation takes one value and outputs one collection of any number
-- of the same value.
-- TODO: When implmenting this model, special-case a global variable
-- named "alias", holding a collection of these free operations.
-- Furthermore, we only declare them as parameters of lambda
-- expressions, which are in fact our only abstractions and our only
-- scope-establishing constructs.
get :: String -> Expr
get name = OneOf (Var name)
lambda :: String -> Expr -> Expr
lambda param body = Collection (LetScope "s"
  (Op (unitBag (Decl "s" param (get "alias"))) (unitBag body)))
call :: Expr -> Expr -> Expr
call fn val = Call fn (unitBag val) emptyBag

For good measure, here’s an extension of that for the calculus of constructions:

-- TODO: When implmenting this model, special-case global variables
-- named "dlambda" and "phi", and disallow any OpCollection FlowNode
-- not guarded by a call to one of those primitives.
binder :: String -> String -> Expr -> Expr -> Expr  -- internal use
binder binderName param paramType body = Call (get binderName)
  (listToBag [Kw "given" paramType, Kw "infer" (lambda param body)])
dlambda :: String -> Expr -> Expr -> Expr
dlambda = binder "dlambda"  -- dependent lambda
phi :: String -> Expr -> Expr -> Expr
phi = binder "phi"

Future work

Of course, I’m leaving the hard parts to that “implementing this model” step. I think that’s a good sign; this is supposed to be a mostly syntactic step, with little semantic investment.

This syntactic system doesn’t do much to support sum types, like having a conditional or pattern-matching expression. It’s much like proof nets for multiplicative linear logic (MLL) in this respect, and there’s bound to be some other proof net research I haven’t gotten to that could fill in some missing pieces. On the other hand, I suspect that system might be better off as a secondary layer after this one, since it shouldn’t result in a recursive graph so much as some kind of multiverse of subgraphs of a recursive graph. List processing is convenient, and recursive graph processing may be nicer for broader purposes, but… whatever-that-is processing? Maybe that can wait….


One thought on “A Dataflow Syntax Sketch

  1. So you’re modeling dataflow generally via linear logic variables, including for the functions themselves. I could see that working, similar to Oz. Though, as you note, I think better support for sums would be essential.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s