We describe a less-known way of implementing typed embedded domain-specific languages in common metalanguages, stressing type preservation, typed compilation, and multiple interpretations. Type preservation statically and patently assures that interpreters never get stuck and hence run more efficiently.
Our main idea is to encode de Bruijn or higher-order abstract syntax using cogen functions rather than data constructors. In other words, we represent object terms not in an initial algebra but using the coalgebraic structure of the lambda-calculus. Our representation also simulates inductive maps from types to types, which are required for typed partial evaluation and CPS transformations.
Our encoding of an object term abstracts over the various ways to interpret it, yet statically assures that the interpreters never get stuck. To achieve self-interpretation and show Jones-optimality, we relate this exemplar of higher-rank and higher-kind polymorphism (provided by ML functors and Haskell~98 constructor classes) to plugging a term into a context of let-polymorphic bindings.
Joint work with Jacques Carette and Chung-chieh Shan.
tagless-final/
README.txt [3K]
Commented code accompanying the paper, with the complete implementations of all interpreters.
Chung-chieh Shan. Slides of the talk at APLAS, November 30, 2007. Singapore.
< http://www.cs.rutgers.edu/~ccshan/tagless/talk.pdf >
Chung-chieh Shan: Embedding languages. Talk at Rutgers computer science department, December 13, 2007.
< http://www.cs.rutgers.edu/~ccshan/quote/language.pdf >
Chung-chieh Shan: Translations. Blog post, August 17, 2007.
< http://conway.rutgers.edu/~ccshan/wiki/blog/posts/Translations/ >
The article elucidates typed tagless interpretations as a way to relate form with meaning in natural languages. An abstract syntax expression may be interpreted either to yield an utterance or text, or to yield a semantic denotation. Types (in linguistics, categories) ensure the well-formedness of forms, expressions, and denotations. More precisely, typed tagless interpretation turns out to closely related to abstract categorial grammars (ACGs).
Typed compilation is a generalization of typechecking (the latter merely verifies if an expression is well-typed). Typed compilation is relied upon when dynamically loading (typed) code or implementing typed DSL. Typed compilation is inherently a partial operation (a source term may be ill-typed) and it has to deal with (run-time) type representations. The result of compilation, however, has no type-related partiality and needs no type information. `Typecheck once, assuredly interpret many times' is our main goal. It is achievable, albeit not simply. We follow the seminal work of Baars and Swierstra on Typing Dynamic Typing.
Conceptually, the typed compiler is a function of the signature typecheck :: Expr -> Term t, whose three concrete examples are given below. One must distinguish this function from a similar, also partial function my_read :: Expr -> Term t. Although the latter has the same signature, the intent is different. When the user writes my_read exp, the user is supposed to specify the type of the desired result. Just as when we write read 1 in Haskell, we are supposed to specify the type of the expected value: Int, Integer, etc. The function typecheck has a different intent: it computes the result type as well as derives the term of that type, the compilation result. The function indeed acts as a compiler of a typed language. The computed result type Term t is a function of the structure of the untyped source expression -- that is, on the face of it, a dependent type. Our solutions are expressed in the language (Haskell) that is not normally thought of as a dependently typed language.
Arthur I. Baars and S. Doaitse Swierstra: Typing Dynamic Typing. ICFP 2002.
Metatypechecking: Staged Typed Compilation into GADT using typeclasses
typecheck :: forall repr. Symantics repr =>
UExp -> Gamma -> Either String (DynTerm repr)
data DynTerm repr = forall a. (Show a, Typeable a) => DynTerm (repr a)
Given the untyped term and the type environment, typecheck returns either a type error message, or the compiled term and the representation of its computed type. The compilation result can be interpreted by any Symantics interpreter assuredly without any pattern-match errors.
The biggest problem is compiling higher-order untyped terms such as Lam "x" (UAdd (UVar "x") (UInt 1)) into typed higher-order abstract syntax, e.g., lam (\x -> add x (int 1)). The naive approach such as
DynTerm (lam (\x -> unDynTerm $ typecheck (UAdd (UVar "x") (UInt 1)) [("x",x)]))
fails in our goal of compiling only once: here, typecheck is invoked every time the compiled function is applied. Since the typechecking is partial, we can no longer assure that the result of a successful compilation can be interpreted without any type-related errors.
Nevertheless, we have accomplished our task, without relying on any built-in staging, GADTs or type classes with functional dependencies. Of all the extensions to Haskell98, we only use existentials and Typeable. The code shows uncanny similarities with staging. Furthermore, it seems we need staging or its emulation for the typed compilation into higher-order abstract syntax. Both type-checking and staging have to manipulate open code, whose free variables are hypotheses, or type abstractions.
Another remarkable part of the tagless final compilation is its close relationship with proofs in logic. Typechecking is truly building a proof, using hypothetical reasoning. Moreover, our typed compiler must be explicit with weakening (to extend the typing environment) and the application of structural rules (which take the form of cut-rules). The result of compiling the body of the function happens do be the same as that of compiling the whole function, which is the statement of the Deduction Theorem.
Finally, our DynTerms are essentially Data.Dynamic, only with more operations and implemented in pure Haskell.
IncopeTypecheck.hs [18K]
Extensively commented Haskell code. The title comments explain the problem of HOAS typechecking, the solution, and its connection to staging. The code requires Incope.hs, see below.
Incope.hs [21K]
Tagless final interpreters. The archive includes Incope.hs, which defines our source language and several its interpreters.
Closing the Stage: from staged code to typed closures
The translation of the staged code to an unstaged language, too, requires manipulation of `open' code and the explicit weakening of the binding environment. The weakening is called coercion in the paper `Closing the Stage'.
class Eval gamma exp result | gamma exp -> result where
eval :: exp -> gamma -> result
Our code has been greatly inspired by the ICFP 2002 paper by Pasalic, Taha, and Sheard on staged tagless interpreters. The paper gives the most lucid explanation of the tagging problem in typed interpretation. Although the paper develops a dependently typed language Meta-D for writing typed tagless interpreters, the paper itself gives hints that dependent types are not really necessary. The key phrase was about indexing types by singleton types rather than by terms. The former is easily implementable in Haskell as it is. The introduction section gave the other hint: the apparent problem with the eval function is that it should yield an Int when evaluating the literal constant expression B 1 and yield a function when evaluating the term L "x" (Var "x"). Indeed no ordinary function can return values of different types. But an overloaded function can, e.g., Haskell's read.
With the help of Template Haskell, we stage our tagless code to remove its interpretative overhead. Because expressions in Template Haskell are untyped, we add a newtype wrapper to maintain their types. Our staged interpreter deals exclusively with these typed code expressions, to be faithful to the Pasalic et al. paper. Template Haskell can print code values, so we can see the staged result: the `compiled' code. In particular, here is the running example of the paper and the result of its evaluation with our staged interpreter:
stest4 = show_tcode $ seval (L (TArr TInt TInt) (V Z)) HNil
*Staged> stest4
\x_0 -> x_0
There are indeed no tags. Here is another test:
stest3 = show_tcode $ seval (A (L TInt (V Z)) (B 2)) HNil
*Staged> stest3
(\x_0 -> x_0) 2#
If we change TInt above to (TArr TInt TInt), we get a type error before running stest3: The typing is done at the meta-level.
The present code was the first attempt to define tagless interpreters in a language without (overt) dependent types. This work has continued in cooperation with Jacques Carette and Chung-chieh Shan. We showed that writing typed interpreters becomes significantly simpler if we change the building blocks of object language terms, from data constructors to constructor functions.
Interp.hs [3K]
The tagless typed interpreter for the the typed language of the above paper, viz. simply-typed lambda-calculus with de Bruijn indices. The interpreter is deliberately patterned after the one in the paper, including the type-level function TypEval. The code almost literally implements the Meta-D interpreter from Fig. 3 of the paper and the typing rules from Fig. 1 -- without any dependent types.
Staged.hs [3K]
The staged tagless typed interpreter.
Implicit configurations -- or, type classes reflect the values of types. Haskell Workshop 2004. Joint work with Chung-chieh Shan.
The paper demonstrates how easy it is to introduce type families indexed by singleton types in Haskell as it was in 2003: Haskell98 extended with multi-parameter type classes with functional dependencies.
Expr -> Term t. Here Expr is an ordinary algebraic data type of untyped source terms (the first-order language described in Peyton-Jones et al, ICFP 2006), and Term t is a GADT, the typed representation.
data Expr = ELit Int | EInc Expr | EIsZ Expr | ...
data Term t where
Lit :: Int -> Term Int
Inc :: Term Int -> Term Int
IsZ :: Term Int -> Term Bool ...
The result type t is a function of the value of Expr. Thus we demonstrate the Haskell solution of the truly dependent-type problem.
Although the result of the typed compilation is a GADT, the compiler itself uses neither GADTs nor representation types. The compiler is made of two parts. The first is the conversion from Expr to `lifted' terms:
newtype FLit = FLit Int
newtype FInc e = FInc e
newtype FIsZ e = FIsZ e
implemented with the help of Template Haskell:
type F = Q TH.Exp
parse :: Expr -> F
parse (ELit x) = [e| FLit $(litE . integerL . fromIntegral $ x) |]
parse (EInc x) = [e| FInc $(parse x) |]
parse (EIsZ x) = [e| FIsZ $(parse x) |]
The only inconvenience of using the Template Haskell is the necessity of splitting the whole code into two modules.
The main part is the typechecker class, which computes both the type of the result t and the corresponding value of the type Term t, the compilation result. The typechecking rules are stated as instances of the type class:
class TypeCheck e t | e -> t where
typecheck :: e -> Term t
instance TypeCheck FLit Int where
typecheck (FLit x) = Lit x
instance TypeCheck e Int => TypeCheck (FInc e) Int where
typecheck (FInc e) = Inc (typecheck e)
instance TypeCheck e Int => TypeCheck (FIsZ e) Bool where
typecheck (FIsZ e) = IsZ (typecheck e)
It is remarkable that the instances express the type checking rules as judgments, almost in the form they are commonly presented in papers. For example, the IsZ rule would be written in a paper as
|- e : int
---------------
|- IsZ e : bool
(We do not need the environment Gamma as our language is first order.)
Given two sample terms (as strings), we parse and typed-compile them:
e1 = "EIf (ELit 1) (ELit 2) (ELit 3)"
e2 = "(EIf (EIsZ (ELit 0)) " ++
" (EInc (ELit 1)) " ++
" (EFst (EPair (ELit 42) (ELit 43))))"
t1' = $(parse . read $ e1)
t2' = $(parse . read $ e2)
-- tt1 = typecheck t1' -- gives a type error
tt2 = typecheck t2'
*G> :t tt2
tt2 :: Term Int
Obviously, the term e1 is ill-typed: the test expression of a conditional should be a boolean rather than an integer. Therefore, the expression typecheck t1 gives a type error: ``Couldn't match expected type Bool against inferred type Int''. In contrast, the typechecking (of the result of the parsing) of e2 succeeds. The computed type of the compilation result is Term Int.
We stress that the typechecking of the embedded DSL is carried out by the Haskell typechecker! It is the latter that applies the typing judgments of our DSL expressed as the instances of the class TypeCheck. We thus succeeded in embedding into Haskell not only DSL terms but also the DSL type system.
TypeCheck.hs [3K]
TermLift.hs [2K]
The complete implementation. The code was originally posted as An example of dependent types [was: Simple GADT parser for the eval example] on the Haskell-Cafe mailing list on Wed, 1 Nov 2006 00:30:56 -0800 (PST).
typecheck :: Gamma -> Exp -> Either String TypedTerm
data TypedTerm = forall t. TypedTerm (Typ t) (Term t)
The compiler takes the type environment and the untyped term, the value of the ordinary algebraic data type Exp, and returns either the type error message, or the compiled term and the representation of its computed type. Both Typ t and Term t are GADTs. Although to Haskell TypedTerm is just as `untyped' as Exp, the latter is `deeply' untyped whereas the former is only superficially. The former value is built out of typed components, and the type is erased only at the end. That fact guarantees that an evaluator of Term t, e.g., eval :: Term t -> t never gets stuck. Combining the evaluator with the typed compiler and the show function gives the complete DSL interpreter teval :: Exp -> String
te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0)))
(EApp (EPrim "inc") (EDouble 20.0))
te4 = EApp (EPrim "rev") te3
*TypecheckedDSL> teval te3
"type Double, value 32.0"
*TypecheckedDSL> teval te4
"Type error: incompatible type of the application: (String->String) and Double"
Upon success, the compiler returns a typed term wrapped in an existential envelope. Although we can evaluate that term, the result is not truly satisfactory because the existential type is not `real'. We cannot write
case (typecheck te3) of Right (TypedTerm t e) -> sin (eval e)although we know that
e has the type Term Double and so applying the function sin to the result of eval e is well-typed. To the Haskell typechecker however, the type of eval e is some abstract type t rather than Double. Fortunately, Template Haskell lets us convert from an existential to a concrete type. This is equivalent to implementing an embedded compiler for our DSL. Since TypedTerm has already been typechecked, we are guaranteed the absence of errors during Template-Haskell-based `code generation'. The compiler, of the signature tevall :: Exp -> ExpQ, can be used as follows
tte3 = $(tevall te3)
:t tte3
tte3 :: Term Double
ev_tte3 = eval tte3
-- 32.0
testr = sin (eval tte3)
-- 0.5514266812416906
The key part of the implementation is the Equality GADT and checking if two DSL types are the same, and if so, computing the proof (the witness).
data EQ a b where Refl :: EQ a a
eqt :: Typ a -> Typ b -> Maybe (EQ a b)
The Template Haskell compiler also relies of the function lift'self :: Term a -> ExpQ satisfying the equation $(lift'self term) == term. It takes only four lines of code to define this function.
TypecheckedDSL.hs [4K]
The complete implementation of a typed DSL with the typed evaluator and the typed compiler from untyped terms to GADT. The code was originally posted as Typechecker to GADT: the full implementation of a typed DSL on the Haskell-Cafe mailing list on Thu Oct 4 02:02:32 EDT 2007.
TypecheckedDSLTH.hs [7K]
TypedTermLiftTest.hs [<1K]
The complete code for the compiler of the typed DSL, using Template Haskell to `compile' GADT to `machine code'. The code was originally posted as Typed DSL compiler, or converting from an existential to a concrete type on the Haskell-Cafe mailing list on Sat Oct 6 03:55:36 EDT 2007.
In the initial approach, typed terms are represented by GADTs. The absence of type-tag mismatch errors is the central property of GADT. The absence of unbound variable reference errors is assured either by higher-order abstract syntax (Xi et al., POPL 2003) or de Bruijn indices and dependent types (Pasalic et al., ICFP 2002). This page has described the final tagless approach. Type-tag mismatch errors are patently absent because there are simply no type tags and hence no possibility of type errors during interpretation. The absence of the second sort of errors can likewise be assured by higher-order abstract syntax (used here) or de Bruijn indices.
We demonstrate that the final and initial typed tagless representations are related by bijection. We use the higher-order language of the Tagless Final paper (APLAS 2007), which is the superset of the language introduced in Xi et al (POPL 2003). In the latter paper, the tagless interpretation of the language was the motivation for GADT. In a bit abbreviated form, the final and the initial representations of our DSL are defined as follows:
class Symantics repr where
int :: Int -> repr Int
lam :: (repr a -> repr b) -> repr (a->b)
app :: repr (a->b) -> repr a -> repr b
fix :: (repr a -> repr a) -> repr a
add :: repr Int -> repr Int -> repr Int
data IR h t where
Var :: h t -> IR h t
INT :: Int -> IR h Int
Lam :: (IR h t1 -> IR h t2) -> IR h (t1->t2)
App :: IR h (t1->t2) -> IR h t1 -> IR h t2
Fix :: (IR h t -> IR h t) -> IR h t
Add :: IR h Int -> IR h Int -> IR h Int
The data constructor Var of the initial representation corresponds to HOASLift of Xi et al. The initial representation is parameterized by the type of the hypothetical environment h: h t is the type of an environment `cell' holding a value of the type t.
The relation between the two representations is established as follows:
instance Symantics (IR h) where
int = INT
lam = Lam
app = App
fix = Fix
add = Add
itf :: Symantics repr => IR repr t -> repr t
itf (Var v) = v
itf (INT n) = int n
itf (Lam b) = lam(\x -> itf (b (Var x)))
itf (App e1 e2) = app (itf e1) (itf e2)
itf (Fix b) = fix(\x -> itf (b (Var x)))
itf (Add e1 e2) = add (itf e1) (itf e2)
We note the properties of the mappings from the final to the initial and vice versa: both mappings are total and a composition of one mapping with the other preserves interpretations. The code below gives concrete examples of that preservation. The totality is especially easy to see for the mapping from the final to the initial, since the mapping looks like identity. The mapping is one of many possible interpretations of a term in the final tagless form.
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML