NOTE: If you’d install remu_ts
and remu_scope
, clone them from RemuLang and use opam install .
to install them locally.
FSYM: An Abstraction On Tagless-Final Style To Compositing And Decoupling Multiple Interpretations¶
Lamu0
is a very simple and basic programming language, but the implementation in plfp/lamu0 shows an approach to use Tagless Final to
- handle the case that the order of interpretations is significant.
- allow the composition of decoupled compiler phases
Given the grammar
type litype = IntT | FloatT | StringT
module type SYM = sig
type repr
val letl : string -> repr -> repr -> repr
val lam : string -> repr -> repr
val app : repr -> repr -> repr
val lit : litype -> string -> repr
val var : string -> repr
end
Terminology¶
The following terms represent the same thing in this presentation.
- interpretation (from a separate view of compiler)
SYM
(from the view of routine implementation in OCaml)- algebra (from a view of mathematics)
- compiler phase(from an overall view from compiler)
Background¶
The steps through type repr
to the executable low level instructions are complex:
Usually, a phase A
should be performed prior to some phase B
, so B
depends on A
.
For instance, if the last phase is back end code generation, it for sure depends on all of other phases.
A -> B -> ... -> C
\ \ |
.. .. |
\ \|/
CodeGen
for each phase, it's either prior to phase CodeGen or phase CodeGen itself.
Another intuituve example, to perform type checking/inference, we shall understand the scope information firstly.
---------------------
| |
let x = 1 | identify the occurrences of
let y = 2 | a type variable
in y + x-------------|
in x---------------------
More instances could be found in the real world:
- All type-directed code generations require the information from (partially) type inference.
- The compilation for type classes, in the approach of dictionary passing, requires type information, and a scope for class instances.
- The closure conversion needs scope information to get the free variables of a function.
- etc…
The tagless final approach works for the polymorphisms of interpreting a given “grammar”, however, it lacks of the facilities to work with multuple separate interpretations, like
- compositing separate interpretations
- resolving the dependency relationships among dependent interpretations
- decoupling the dependent interpretations as much as possible
To address these problems, I proposed
- an operation(
grow
) among the algebras to make reductions, - a module signature(
FSYM
) very close to the shape ofSYM
to composite and decouple interpretations.
which could be taken advantage of to solve above 3 pain spots fairly easy.
Introduction¶
The core idea is the generalization of fold
operation,
and I call it grow
.
grow (grow (grow zero fst) snd) third
, where zero
, grow (zero, fst)
, and similar stuffs are algebras(module SYM
),
however, all of them have distinct representations(usually written as type repr
in a SYM
).
In this design, for grow(m, mexpander)
, where m
is an algebra,
and mexpander
can produce a new algebra(e.g., scope->scope+type
) by composing with m
via grow
.
Something occurs naturally and simultaneously during expanding the algebra is, the expansion of the representation type(type repr
in an algebra).
A : module SYM with type repr = o (* base *)
B : module SYM with type repr = r (* result *)
We check the “delta” from o = A.repr
to r = B.repr
,
and represent it by type c
, in other words:
grow
transforms the type repr
in the algebra(SYM
) from o
to r
, with a delta c
, while transforming the interpretation from A
to A+B
.
We extract the type expanding function out:
val combine : o -> c -> r
And extract the interpretation expanding function out:
val grow: 'o 'c 'r.
(module SYM with type repr = 'o) ->
(module FSYM with type o ='o and type c = 'c and type r = 'r) ->
(module SYM with type repr = 'r)
So the current goal is to extract the structure of FSYM
, and make sure it satisfy our final goals:
compositing separate interpretations:
This is already explicit and natural, according the type of
grow
function.resolving the dependency relationships among dependent interpretations:
We’re supposed to make sure
FSYM
can use the interpreted result ofmodule SYM with type repr = o
to implement the interpretionmodule SYM with type repr=r
.decoupling the dependent interpretations as much as possible:
We’re supposed to decouple the implementation of
module SYM with type repr = o
andFSYM
.That is to say, we don’t have to, and even shouldn’t know or aware the implementation of
module SYM with type repr = o
when we’re impelementingFSYM
.
We point out that, the following signature would suffice
module type FSYM = sig
type r
type c (*delta*)
type o
val combine : o -> c -> r
val project : r -> o
val letl : o -> string -> r -> r -> c
val lam : o -> string -> r -> c
val app : o -> r -> r -> c
val lit : o -> litype -> string -> c
val var : o -> string -> c
end
Deriving an FSYM
from SYM
is trivial:
Besides the common part
sig
type o
type c (* delta from o to c *)
type r
val combine : o -> c -> r
val project : r -> o
end
Any operator of type a -> b -> ... -> r
in SYM
,
is supposed to be transformed to o -> a -> b -> ... -> c
in FSYM
,
where the symbols o, c, r
keep the same meanings as the aforementioned:
o
is the originalrepr
of the original algebraA
,r
is therepr
of the result algebra(A+B
) transformed bygrow(A, B)
,c
is the delta of the change fromo
tor
. There’s no algebra likeSYM
, but exactly a functorA->A+B
.
To elaborate, we can use lam
operator as an examplar:
in
SYM
/algebra/interpretationA
:val lam_1: string -> o -> o
in
FSYM
,A->A+B
:val lam_2: o -> string -> r -> c
in
SYM
/algebra/interpretationA+B
val lam_3: string -> r -> r
We now need to implement lam_3
via lam_1
and lam_2
.
Recall the last 2 of our goals which haven’t been accomplished:
decoupling the dependent interpretations as much as possible:
Which is to say,
lam_2
andlam_3
shouldn’t aware howlam_1
gets implemented, and it’s easy to satisfy:let lam_3 (argname: string) (body: r) = let body_o: o = project body in let o = lam_1 argname body_o in (* HIGHLIGHTING HERE! *) ...
Of course,
lam_1
, and anything else for implementing the prior interpretation(a.k.aA
), shouldn’t be referred inlam_2
’s implementation, or howlam_3
useslam_2
.resolving the dependency relationships among dependent interpretations:
Hence,
lam_2
should use the result of the prior interpretation(a.k.aA
), and it’s quite easy as well:let lam_3 (argname: string) (body: r) = let body_o: o = project body in let o = lam_1 argname body_o in let c = lam_2 o argname body in (* HIGHLIGHTING HERE! *) combine o c
The whole code for grow
can be found at final.ml L28-L59,
but notice that the type repr
in SYM
is written in a shorter form r
.
* In fact, if we use lazy types as the repr
of each interpretation/phase,
the order of interpretation can be more flexible.
Check Lamu0
in the sub-section Application
.
Application¶
Lamu0 gives a very simple example to compose the existing and decoupled frameworks for compilers.
Scoping: Name Resolution¶
An existing simple framework, remu_scope, designed for name resolution, also written by me, provides following 3 major APIs:
val require: env -> scoperef -> name -> sym
val enter: env -> scoperef -> name -> sym
val subscope: env -> scoperef -> scoperef
For example, to solve the scope of following code:
let x = 1 in x
We can do:
let env = empty_env() in
let root: scoperef = 0 in
let let_scope = subscope env root in
let x_assign = enter let_scope "x" in
let x_load = require let_scope "x"
With this snippet, you can check assert (x_assign = x_load)
.
With tagless final extended by FSYM
abstraction and above existing framework, we can then implement a standalone but composable interpretation for name resolution:
module Scoping = Remu_scope.Solve
type scopedesc =
| Sym of Scoping.sym
| ScopeUnrelated (* for expressions that're not variables *)
type scopeinfo = {desc: scopedesc; i: Scoping.scoperef}
module type STScope = sig
type o
type c = scopeinfo Lazy.t
type r
val env : Scoping.env
val cur_scoperef : Scoping.scoperef ref
val combine: o -> c -> r
val project: r -> o
val get: r -> scopeinfo
end
module FSYMScope(ST : STScope) = struct
include ST
let letl : o -> string -> r -> r -> c = ...
let lam: o -> string -> r -> c = ...
let app: o -> r -> r -> c = ...
let lit: o -> litype -> string -> c = ...
let var: o -> string -> c = ...
end
The whole code can be found at lamu0_ast.ml L5-L60.
We unroll the implementation of lam
:
(*
let subscope () = Scoping.subscope ST.env (!ST.cur_scoperef)
let enter n = Scoping.enter ST.env (!ST.cur_scoperef) n
let with_scope si' f =
let si = !ST.cur_scoperef in
ST.cur_scoperef := si';
let ret = f() in
ST.cur_scoperef := si;
{desc=ret; i=si}
*)
let lam: o -> string -> r -> c = fun _ n e -> lazy begin
let si' = subscope () in
with_scope si' @@ fun () ->
let _ = enter n in
let _ = get e in
ScopeUnrelated end
It’s pretty easy, and can be composed into the compilation pipeline, for every programming language whose scope could be expressed by remu_scope
.
Typing: Type Inference¶
Type inference requires already knowing the scope information.
So it depends on the previous phase, name resolution.
Firstly we check an existing framework providing type inference, remu_ts.
And we just use a very limited part of remu_ts
, here’s an example of this framework:
To infer the types of code,
val f : forall a. 'a -> 'a -> bool
let x = 1 in f x y
We write
open Remu_ts.Infer
open Remu_ts.Comm
open Remu_ts.Builder
module TC : TState = (val crate_tc empty_tctx : TState)
let _ = let open TC in
let intt = new_type "int" in
let boolt = new_type "bool" in
let x = new_tvar() in
let y = new_tvar() in
let f = Forall(["a"], Arrow(Fresh "a", Arrow(Fresh "a", boolt))) in
(* x = 1 *)
assert (unify x intt);
(* f x y *)
let arg1 = new_tvar() in
let arg2 = new_tvar() in
assert (unify arg1 x);
assert (unify arg2 y);
let func = Arrow(arg1, Arrow(arg2, boolt)) in
assert (unify f func);
let print_ty name x =
Printf.printf "%s: %s\n" name @@
dumpstr
(mk_show_named_nom (module TC)) @@
prune x
in
print_ty "x" x;
print_ty "y" y;
print_ty "func" func
After running this file, we got
x: ^int
y: ^int
func: ^int -> ^int -> ^bool
The implementation of FSYM
to leverage above existing framework is:
module Typing = Remu_ts.Infer
module type STType = sig
type o
type c = Typing.t Lazy.t
type r
val combine: o -> c -> r
val project: r -> o
(* type checking states *)
val tc: (module Typing.TState)
(* from repr to type *)
val rtype: r -> Typing.t
(* from symbol to type *)
val ntype: o -> Scoping.name -> Typing.t
(* annotate symbol's type *)
val ann: o -> Scoping.name -> Typing.t -> unit
(* basic types *)
val intt: Typing.t
val strt: Typing.t
val floatt: Typing.t
end
exception TypeError
module FSYMType(ST: STType) = struct
include ST
module TC = (val tc)
open TC
let letl : o -> string -> r -> r -> c = ...
let lam: o -> string -> r -> c = ...
let app: o -> r -> r -> c = ...
let lit: o -> litype -> string -> c = ...
let var: o -> string -> c = ...
end
The whole code of this could be found at Lamu0_ast.ml #L62-L129.
For a rough sketch, let’s check the implementation lam
again:
let lam o n e = lazy begin
let eo = project e in
let var_of_arg = new_tvar() in
ann eo n var_of_arg;
Typing.Arrow(var_of_arg, rtype e) end
Finally, we assembly things together, and make a type inferencer for Lamu0
at main.ml.
You can run the type infer REPL with: dune exec lamu0 --profile release
:
let x = 1 in x;;
=> expr0 : ^int
let y =
let f = fn x => fn y => y in
let g = f 1 2.0 in
f
in y;;
=> expr0 : ^int -> ^float -> ^float
let f = fn y => y "123" in f (fn x => x);;
=> expr0 : ^string