The paper introduces the continuation+state monad for code generation with let-insertion, and briefly describes abstract interpretation for generating efficient code without its intensional analysis. These are the foundational techniques for typeful multi-staged (MetaOCaml) programming.
Joint work with Walid Taha, Kedar N. Swadi, and Emir Pasalic.
Dynamic Programming Benchmark: The corresponding MetaOCaml source code
< http://www.metaocaml.org/examples/dp/ >
Can we generate optimal code in one pass, without further transformations such as common-subexpression-elimination, without looking into the code at all after it has been generated? The papers below answer affirmatively. The papers discuss in detail the generation of a straight-line C, Fortran, etc. code for the power-of-two FFT. The papers show that with the help of Abstract Interpretation we can exactly match the quality of code generated by the well-known system FFTW. The second paper demonstrates that our generated power-of-two FFT code has exactly the same number of floating-point operations as that in codelets of FFTW. The significant difference from FFTW is that we do not do any intensional code analysis -- the generated code is a black box and can not be processed nor manipulated any further. Moreover, the generated code cannot even be compared in equality. The reason for these restrictions is to maintain strong invariants about the generated code, e.g., that the generated code is automatically strongly typed and does not need to be typechecked. These invariants and the absence of ad hoc manipulation on the generated low-level code significantly increase our confidence in the result.
Unlike FFTW, we know precisely where savings are coming from, and which particular equivalences contribute to which improvements in the code.
The paper makes the case that ``there are benefits to focusing on writing better generators rather than on fixing the results of simple generators.''
The corresponding MetaOCaml source code
< http://www.metaocaml.org/examples/fft.ml >
Relating FFTW and Split-Radix.
Joint work with Walid Taha.
Proc. of ICESS'04, the First International Conference on Embedded Software and System, December 9-10 2004, Hangzhou (Zhejiang), China.
< http://www.cs.rice.edu/~taha/publications/conference/icess04.pdf >
FFTW: Fastest Fourier Transform in the West
< http://www.fftw.org >
Joint work with Jacques Carette.
< http://www.cas.mcmaster.ca/~carette/metamonads/index.html >
Besides the paper, the page points to the generator code, and the generated Gaussian Elimination codes to handle real and integer matrices using various pivoting strategies.
This work explores the promises of generative programming languages and techniques for the high-performance computing expert. We show that complex, architecture-specific optimizations can be implemented in a type-safe, purely generative framework. Peak performance is achievable through the careful combination of a high-level, multi-stage evaluation language -- MetaOCaml -- with low-level code generation techniques. Nevertheless, our results also show that generative approaches for high-performance computing do not come without technical caveats and implementation barriers concerning productivity and reuse. We describe these difficulties and identify ways to hide or overcome them, from abstract syntaxes to heterogeneous generators of code generators, combining high-level and type-safe multi-stage programming with a back-end generator of imperative code.
Joint work with Albert Cohen, Se'bastien Donadio, Mari'a Jesu's Garzara'n, Christoph Armin Herrmann, and David A. Padua.
< http://www-rocq.inria.fr/~acohen/publications/CDGHKP06.ps.gz >
From `Albert Cohen - Publications'
D a => a -> a where D is a Floating-like class, and produce a function of the same type that is the symbolic derivative of the former. The original function can be given to us in a separately compiled module with no available source code. The derived function is an ordinary numeric Haskell function and can be applied to numeric arguments -- or differentiated further. The Floating-like class D currently supports arithmetic and a bit of trigonometry. We also support partial derivatives.
test1f x = x * x + fromInteger 1
test1 = test1f (2.0::Float) -- 5.0
test1f' x = diff_fn test1f x
test1' = test1f' (3.0::Float) -- 6.0
test1f'' x = diff_fn test1f' x
test1'' = test1f'' (10.0::Float) -- 2.0
The original function test1f can be evaluated numerically, test1, or differentiated symbolically, test1f'. The result is again an ordinary numeric function (i.e., 2x), which can be applied to a numeric argument, see test1'. Alternatively, test1f' can be differentiated further.
The original function is emphatically not represented as an algebraic data type -- it is a numeric function like sin or tan. Still, we are able to differentiate it symbolically (rather than numerically or automatically). The key insight is that Haskell98 supports a sort of reflection -- or, to be precise, type-directed partial evaluation and hence term reconstruction.
Our approach also shows off the specification of the differentiation rules via type classes (which makes the rules extensible) and the emulation of GADT via type classes. In 2006, we improved the approach by developing an algebraic simplifier and by avoiding any interpretative overhead.
Most optimal symbolic differentiation of compiled numeric functions
Our approach is reifying code into its `dictionary view', intensional analysis of typed code expressions, and staging so to evaluate under lambda. We improve the earlier, 2004 approach in algebraically simplifying the result of symbolic differentiation, and in removing interpretative overhead with the help of Template Haskell (TH). The computed derivative can be compiled down to the machine code and so it runs at full speed, as if it were written by hand to start with.
In the process, we develop a simple type system for a subset of TH code expressions (TH is, sadly, completely untyped) -- so that accidental errors can be detected early. We introduce a few combinators for the intensional analysis of such typed code expressions. We also show how to reify an identifier like (+) to a TH.Name -- by applying TH to itself. Effectively we obtain more than one stage of computation.
Our technique can be considered the inverse of the TH splicing operation: given a (compiled) numeric expression of a host program, we obtain its source view as a TH representation. The latter can be spliced back into the host program and compiled -- after, perhaps, simplification, partial evaluation, or symbolic differentiation. As an example, given the definition of the ordinary numeric, Floating a => a->a function test1f x = let y = x * x in y + 1 (which can be located in a separately compiled file), we reify it into a TH code expression, print it, differentiate it symbolically, and algebraically simplify the result:
*Diff> test1cp
\dx_0 -> GHC.Num.+ (GHC.Num.* dx_0 dx_0) 1
*Diff> test1dp
\dx_0 -> GHC.Num.+ (GHC.Num.+ (GHC.Num.* 1 dx_0) (GHC.Num.* dx_0 1)) 0
*Diff> test1dsp
\dx_0 -> GHC.Num.+ dx_0 dx_0
The output is produced by TH's pretty-printer. We can splice the result in a Haskell program as $(reflectQC test1ds) 2.0 and use it as ordinary, hand-written function \x -> x+x.Diff.hs [9K]
The main implementation file, which defines the reification of code into TH representation, differentiation rules and algebraic simplification rules, all via the intensional analysis of the typed code. The file also includes many examples, including those of partial differentiation.
DiffTest.hs [<1K]
Running the splicing tests from Diff.hs. Due to the TH requirement, this code must be in a separate module.
TypedCode.hs [5K]
This file introduces the type Code a of typed TH code expressions. The (phantom) type parameter is the expression's type. The file defines combinators for building and analyzing these typed expressions.
TypedCodeAux.hs [2K]
Obtain the Name that corresponds to a top-level (Prelude-level) Haskell identifier by applying TH to itself.
Here is the simplest example of scope extrusion in MetaOCaml caused by the effect of mutating a state:
# let code = let x = ref .<1>. in
let _ = .<fun v -> .~(x := .<v>.; .<()>.)>. in
!x;;
val code: ('a, int) code = .<v_1>.
# .!code;;
Unbound value v_1
Exception: Trx.TypeCheckingError.
We have managed to build a piece of code with literally a free variable. Evaluating this code (by MetaOCaml's `run' operation .!) causes a paradoxical type error at run-time. The type-checker has accepted the code that it should not have. We have seen such scope extrusion arising from honest mistakes of let-insertion in real program generation with effects. No staged language today can statically prevent such mistakes.
Our goal is to make program generation convenient and safe, in particular, to statically prevent errors like scope extrusion. Developing a sound type system of staged code with effects requires a suitable calculus. To model real staged programming languages like MetaOCaml, we need a call by value calculus that supports splicing of open code, running the closed code, and cross-stage persistence. Modeling effects, especially control effects such as delimited continuations, is better with small-step operational semantics. Many formal calculi for staged programming have evolved over the last couple of decades. Here is a sample:
Our goal for this project is to make, with the benefit of hindsight, the existing staged calculi more uniform and their features more orthogonal to each other, so that they are easier to study, mechanize, and extend (for example, to add side effects). To be more precise, we aim to:
Joint work with Chung-chieh Shan and Yukiyoshi Kameyama.
This paper takes a first step towards solving the problem, by translating the staging away. Our source language models MetaOCaml restricted to one future stage. It is a call-by-value language, with a sound type system and a small-step operational semantics, that supports building open code, running closed code, cross-stage persistence, and non-termination effects. We translate each typing derivation from this source language to the unstaged System F with constants. Our translation represents future-stage code using closures, yet preserves the typing, alpha-equivalence (hygiene), and (we conjecture) termination and evaluation order of the staged program.
To decouple evaluation from scope (a defining characteristic of staging), our translation weakens the typing environment of open code using a term coercion reminiscent of Goedel's translation from intuitionistic to modal logic. By converting open code to closures with typed environments, our translation establishes a framework in which to study staging with effects and to prototype staged languages. It already makes scope extrusion a type error.
Joint work with Yukiyoshi Kameyama and Chung-chieh Shan.
Chung-chieh Shan. Slides of the talk at PEPM, January 7, 2008.
< http://www.cs.rutgers.edu/~ccshan/metafx/talk.pdf >
paper-examples.ml [11K]
Examples of the staged code and its translation. This file contains the complete code for all the examples in the paper plus extra examples.
power-count.ml [2K]
Computing a staged power function while tracking the number of multiplications. The example in Sec 6 of the paper.
Our solution represents contexts outside-in and uses dependent types to describe the binding structure of contexts and the corresponding structure of open terms. We convinced Twelf that our our functions to decompose a term into an (open) redex and its context, and to plug an open term into its closing context are total. This totality suggests that our types adequately represent ordered dependent sequences of bindings, be they needed by an expression or provided by a context. These focusing and zipping functions let us specify the first small-step semantics for staging.
The challenge remains to represent contexts inside-out while expressing its binding structure, in particular how the continuation of a staged evaluator may ``bind off'' a later-stage variable.
Joint work with Chung-chieh Shan.
Here is a sample MetaOCaml code and its Scheme translation
# let eta = fun f -> .<fun x -> .~(f .<x>.)>.
in .<fun x -> .~(eta (fun y -> .<x + .~y>.))>.
- : ('a, int -> int -> int) code = .<fun x_1 -> fun x_2 -> (x_1 + x_2)>.
(let ((eta (lambda (f) (bracket (lambda (x) (escape (f (bracket x))))))))
(bracket (lambda (x) (escape (eta (lambda (y) (bracket (+ x (escape y)))))))))
Translating MetaOCaml code into Scheme seems trivial: code values are like S-expressions, MetaOCaml's bracket is like quasiquote, escape is like unquote, and `run' is eval. If we indeed replace bracket with the quasiquote and escape with the unquote in the above Scheme code and then evaluate it, we get the S-expression (lambda (x) (lambda (x) (+ x x))), which is a wrong result, quite different from the code expression .<fun x_1 -> fun x_2 -> (x_1 + x_2)>. produced by MetaOCaml. The latter is the code of a curried function that adds two integers. The S-expression produced by the naive Scheme translation represents a curried function that disregards the first argument and doubles the second. This demonstrates that the often mentioned `similarity' between the bracket and the quasiquote is flawed. MetaOCaml's bracket respects alpha-equivalence; in contrast, Scheme's quasiquotation, being a general form for constructing arbitrary S-expressions (not necessarily representing any code), is oblivious to the binding structure.
Our implementation still uses S-expressions for code values (so we can print them and eval-uate them) and treats escape as unquote. To maintain the hygiene, we need to make sure that every run-time evaluation of a bracket form such as (bracket (lambda (x) x)) gives '(lambda (x_new) x_new) with the unique bound variable x_new. Two examples in the source code demonstrate why static renaming of manifestly bound identifiers is not sufficient. We implement the very clever suggestion by Chung-chieh Shan and represent a staged expression such as .<(fun x -> x + 1) 3>. by the sexp-generating expression `(,(let ((x (gensym))) `(lambda (,x) (+ ,x 1))) 3) which evaluates to the S-expression ((lambda (x_1) (+ x_1 1)) 3). Thus bracket is a complex macro that transforms its body to the sexp-generating expression, keeping track of the levels of brackets and escapes. The macro bracket is implemented as a CEK machine with the defunctionalized continuation. In our implementation, the Scheme translation of the eta-example yields the S-expression (lambda (g6) (lambda (g8) (+ g6 g8))). Just as in MetaOCaml, the result represents the curried addition.
CSP poses another implementation problem. In MetaScheme we can write the MetaOCaml expression .<fun x -> x + 1>. as (bracket (lambda (x) (+ x 1))), which yields an S-expression (lambda (g1) (+ g1 1)). When we pass this code to eval, the identifier + will be looked up in the environment of eval, which is generally different from the environment that was in effect when the original bracket form was evaluated. That might not look like much of a difference since + is probably bound to the same procedure for adding numbers in either environment. This is no longer the case if we take the following MetaOCaml code and its putative MetaScheme translation:
let test = let id u = u in
.<fun x -> id x>.
(define test (let ((id (lambda (u) u)))
(bracket (lambda (x) (id x)))))
The latter definition binds test to the S-expression (lambda (x) (id x)) that contains a free identifier id, unlikely to be bound in the environment of eval. Our code values therefore should be `closures', being able to include values that are (possibly locally) bound in the environment when the code value was created. Incidentally, the problem of code closures closed over the environment of the generator also appears in syntax-case macros. R6RS editors made a choice to prohibit the occurrence of locally-bound identifiers in the output of a syntax-case transformer.
MetaOCaml among other staged systems does permit the inclusion of values from the generator stage in the generated code. Such values are called CSP; they are evident in the output of the MetaOCaml interpreter for the above test:
val test : ('a, 'b -> 'b) code =
.<fun x_1 -> (((* cross-stage persistent value (as id: id) *)) x_1)>.
In MetaScheme, we have to write CSP explicitly: (% e), which is often called `lift'.
(define test
(let ((id (lambda (u) u)))
(bracket (lambda (x) ((% id) x)))))
One may think that such a lifting is already available in Scheme: (define (lift x) (list 'quote x)). Although this works in the simple cases of x being a number, a string, etc., it is neither universal nor portable: attempting this way to lift a closure or an output-port could be an error. According to R5RS, the argument of a quotation is an external representation of a Scheme datum. Closures, for example, are not guaranteed to have an external representation. For portability, we implement CSP via a reference in a global array, taking advantage of the fact that both the index (a number) and the name of the array (an identifier) have external representations and hence are trivially liftable by quotation. This is precisely the mechanism used by the current version of MetaOCaml.
The source code contains many more examples of the translation of MetaOCaml code into MetaScheme -- including the examples with many stages and CSP.
Joint work with Chung-chieh Shan.
We specify the compilation from the intermediate language to the STT denotation as reduction rules. Hence the intermediate language is (call-by-name) lambda calculus whose constants are STT formulas. To compile an intermediate language term to its STT denotation we evaluate the term. The compilation is not always compositional: so-called scopal expressions contribute their meaning in the places other than where they are seen or heard. Scopal phenomena include generalized quantification, wh-phrases, topicalization, focus, anaphora and binding. To account for these phenomena, our intermediate language includes control operators shift and reset. We are thus able to improve on the previous continuation-based analyses of quantification, binding, raised and in-situ wh-questions, binding in wh-questions, and superiority.
The evaluation of an intermediate language term does not always lead to the STT denotation: the evaluation may get stuck. In that case, we regard the corresponding source language phrase as ungrammatical. We introduce a type system to delineate a set of intermediate language terms whose evaluation never fails to produce denotation. Being well-typed becomes an alternative criterion of grammaticality. Typeability, unlike reducibility, has an advantage of being applicable to separate phrases rather than whole sentences. Our main result is that both typing and call-by-name are necessary to correctly predict superiority and binding in wh-questions with topicalization, without resorting to thunking or type raising and thus maintaining the uniformity of the analyses.
gengo-cbn-talk.pdf [224K]
Presentation at the workshop, August 6, 2008. Hamburg, Germany.
gengo-side-effects-cbn.elf [19K]
The complete code with all the analyses described in the paper
Call-by-name typed shift/reset calculus
Our intermediate language
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML