NOTE: If you’d install remu_ts and remu_scope, clone them from RemuLang and use opam install . to install them locally.

FSYM: An Abstraction On Tagless-Final Style To Compositing And Decoupling Multiple Interpretations

Lamu0 is a very simple and basic programming language, but the implementation in plfp/lamu0 shows an approach to use Tagless Final to

  • handle the case that the order of interpretations is significant.

  • allow the composition of decoupled compiler phases

Given the grammar

type litype = IntT | FloatT | StringT

module type SYM = sig
  type repr
  val letl : string -> repr -> repr -> repr
  val lam  : string -> repr -> repr
  val app  : repr -> repr -> repr
  val lit  : litype -> string -> repr
  val var  : string -> repr
end

Terminology

The following terms represent the same thing in this presentation. - interpretation (from a separate view of compiler) - SYM (from the view of routine implementation in OCaml) - algebra (from a view of mathematics) - compiler phase(from an overall view from compiler)

Background

The steps through type repr to the executable low level instructions are complex: Usually, a phase A should be performed prior to some phase B, so B depends on A.

For instance, if the last phase is back end code generation, it for sure depends on all of other phases.

A -> B -> ... -> C
 \   \         |
  ..   ..      |
          \   \|/
            CodeGen

for each phase, it's either prior to phase CodeGen or phase CodeGen itself.

Another intuituve example, to perform type checking/inference, we shall understand the scope information firstly.

    ---------------------
    |                   |
let x = 1               |  identify the occurrences of
   let y = 2            |  a type variable
   in y + x-------------|
in x---------------------

More instances could be found in the real world:

  • All type-directed code generations require the information from (partially) type inference.

  • The compilation for type classes, in the approach of dictionary passing, requires type information, and a scope for class instances.

  • The closure conversion needs scope information to get the free variables of a function.

  • etc…

The tagless final approach works for the polymorphisms of interpreting a given “grammar”, however, it lacks of the facilities to work with multuple separate interpretations, like

  • compositing separate interpretations

  • resolving the dependency relationships among dependent interpretations

  • decoupling the dependent interpretations as much as possible

To address these problems, I proposed - an operation(grow) among the algebras to make reductions, - a module signature(FSYM) very close to the shape of SYM to composite and decouple interpretations.

which could be taken advantage of to solve above 3 pain spots fairly easy.

Introduction

The core idea is the generalization of fold operation, and I call it grow.

grow (grow (grow zero fst) snd) third, where zero, grow (zero, fst), and similar stuffs are algebras(module SYM), however, all of them have distinct representations(usually written as type repr in a SYM).

In this design, for grow(m, mexpander), where m is an algebra, and mexpander can produce a new algebra(e.g., scope->scope+type) by composing with m via grow.

Something occurs naturally and simultaneously during expanding the algebra is, the expansion of the representation type(type repr in an algebra).

A : module SYM with type repr = o (* base *)
B : module SYM with type repr = r (* result *)

We check the “delta” from o = A.repr to r = B.repr, and represent it by type c, in other words:

``grow`` transforms the type ``repr`` in the algebra(``SYM``) from ``o`` to ``r``, with a delta ``c``, while transforming the interpretation from ``A`` to ``A+B``.

We extract the type expanding function out:

val combine : o -> c -> r

And extract the interpretation expanding function out:

val grow: 'o 'c 'r.
    (module SYM with type repr = 'o) ->
    (module FSYM with type o ='o and type c = 'c and type r = 'r) ->
    (module SYM with type repr = 'r)

So the current goal is to extract the structure of FSYM, and make sure it satisfy our final goals:

  • compositing separate interpretations:

    This is already explicit and natural, according the type of grow function.

  • resolving the dependency relationships among dependent interpretations:

    We’re supposed to make sure FSYM can use the interpreted result of module SYM with type repr = o to implement the interpretion module SYM with type repr=r.

  • decoupling the dependent interpretations as much as possible:

    We’re supposed to decouple the implementation of module SYM with type repr = o and FSYM.

    That is to say, we don’t have to, and even shouldn’t know or aware the implementation of module SYM with type repr = o when we’re impelementing FSYM.

We point out that, the following signature would suffice

module type FSYM = sig
    type r
    type c (*delta*)
    type o

    val combine  : o -> c -> r
    val project  : r -> o

    val letl : o -> string -> r -> r -> c
    val lam  : o -> string -> r -> c
    val app  : o -> r -> r -> c
    val lit  : o -> litype -> string -> c
    val var  : o -> string  -> c
end

Deriving an FSYM from SYM is trivial:

Besides the common part

sig
    type o
    type c (* delta from o to c *)
    type r

    val combine  : o -> c -> r
    val project  : r -> o
end

Any operator of type a -> b -> ... -> r in SYM, is supposed to be transformed to o -> a -> b -> ... -> c in FSYM, where the symbols o, c, r keep the same meanings as the aforementioned:

  • o is the original repr of the original algebra A,

  • r is the repr of the result algebra(A+B) transformed by grow(A, B),

  • c is the delta of the change from o to r. There’s no algebra like SYM, but exactly a functor A->A+B.

To elaborate, we can use lam operator as an examplar:

  • in SYM/algebra/interpretation A: ocaml  val lam_1: string -> o -> o

  • in ``FSYM``, A->A+B: ocaml   val lam_2: o -> string -> r -> c

  • in SYM/algebra/interpretation A+B ocaml   val lam_3: string -> r -> r

We now need to implement lam_3 via lam_1 and lam_2.

Recall the last 2 of our goals which haven’t been accomplished:

  • decoupling the dependent interpretations as much as possible:

    Which is to say, lam_2 and lam_3 shouldn’t aware how lam_1 gets implemented, and it’s easy to satisfy:

    let lam_3 (argname: string) (body: r) =
        let body_o: o = project body in
        let o = lam_1 argname body_o in (* HIGHLIGHTING HERE! *)
        ...
    

    Of course, lam_1, and anything else for implementing the prior interpretation(a.k.a A), shouldn’t be referred in lam_2’s implementation, or how lam_3 uses lam_2.

  • resolving the dependency relationships among dependent interpretations:

    Hence, lam_2 should use the result of the prior interpretation(a.k.a A), and it’s quite easy as well:

    let lam_3 (argname: string) (body: r) =
    let body_o: o = project body in
    let o = lam_1 argname body_o in
    let c = lam_2 o argname body in  (* HIGHLIGHTING HERE! *)
    combine o c
    

The whole code for grow can be found at final.ml L28-L59, but notice that the type repr in SYM is written in a shorter form r.

* In fact, if we use lazy types as the repr of each interpretation/phase, the order of interpretation can be more flexible.

Check Lamu0 in the sub-section Application.

Application

Lamu0 gives a very simple example to compose the existing and decoupled frameworks for compilers.

Scoping: Name Resolution

An existing simple framework, remu_scope, designed for name resolution, also written by me, provides following 3 major APIs:

val require: env -> scoperef -> name -> sym
val enter: env -> scoperef -> name -> sym
val subscope: env -> scoperef -> scoperef

For example, to solve the scope of following code:

let x = 1 in x

We can do:

let env = empty_env() in
let root: scoperef = 0 in
let let_scope = subscope env root in
let x_assign = enter let_scope "x" in
let x_load = require let_scope "x"

With this snippet, you can check assert (x_assign = x_load).

With tagless final extended by FSYM abstraction and above existing framework, we can then implement a standalone but composable interpretation for name resolution:

module Scoping = Remu_scope.Solve

type scopedesc =
  | Sym of Scoping.sym
  | ScopeUnrelated (* for expressions that're not variables *)

type scopeinfo = {desc: scopedesc; i: Scoping.scoperef}

module type STScope = sig
  type o
  type c = scopeinfo Lazy.t
  type r
  val env : Scoping.env
  val cur_scoperef : Scoping.scoperef ref
  val combine: o -> c -> r
  val project: r -> o
  val get: r -> scopeinfo
end

module FSYMScope(ST : STScope) = struct
    include ST
    let letl : o -> string -> r -> r -> c = ...
    let lam: o -> string -> r -> c = ...
    let app: o -> r -> r -> c = ...
    let lit: o -> litype -> string -> c = ...
    let var: o -> string -> c  = ...
end

The whole code can be found at lamu0_ast.ml L5-L60.

We unroll the implementation of lam:

(*
let subscope () = Scoping.subscope ST.env (!ST.cur_scoperef)
let enter n = Scoping.enter ST.env (!ST.cur_scoperef) n
let with_scope si' f =
    let si = !ST.cur_scoperef in
      ST.cur_scoperef := si';
      let ret = f() in
      ST.cur_scoperef := si;
      {desc=ret; i=si}
*)
let lam: o -> string -> r -> c = fun _ n e -> lazy begin
    let si' = subscope () in
    with_scope si' @@ fun () ->
    let _ = enter n in
    let _ = get e in
    ScopeUnrelated end

It’s pretty easy, and can be composed into the compilation pipeline, for every programming language whose scope could be expressed by remu_scope.

Typing: Type Inference

Type inference requires already knowing the scope information.

So it depends on the previous phase, name resolution.

Firstly we check an existing framework providing type inference, remu_ts.

And we just use a very limited part of remu_ts, here’s an example of this framework:

To infer the types of code,

val f : forall a. 'a -> 'a -> bool
let x = 1 in f x y

We write

open Remu_ts.Infer
open Remu_ts.Comm
open Remu_ts.Builder

module TC : TState = (val crate_tc empty_tctx : TState)
let _ = let open TC in
   let intt = new_type "int" in
   let boolt = new_type "bool" in
   let x = new_tvar() in
   let y = new_tvar() in
   let f = Forall(["a"], Arrow(Fresh "a", Arrow(Fresh "a", boolt))) in

   (* x = 1 *)
   assert (unify x intt);

   (* f x y *)
   let arg1 = new_tvar() in
   let arg2 = new_tvar() in
   assert (unify arg1 x);
   assert (unify arg2 y);
   let func = Arrow(arg1, Arrow(arg2, boolt)) in
   assert (unify f func);
   let print_ty name x =
        Printf.printf "%s: %s\n" name   @@
        dumpstr
        (mk_show_named_nom (module TC)) @@
        prune x
    in
   print_ty "x" x;
   print_ty "y" y;
   print_ty "func" func

After running this file, we got

x: ^int
y: ^int
func: ^int -> ^int -> ^bool

The implementation of FSYM to leverage above existing framework is:

module Typing = Remu_ts.Infer
module type STType = sig
  type o
  type c = Typing.t Lazy.t
  type r
  val combine: o -> c -> r
  val project: r -> o
  (* type checking states *)
  val tc: (module Typing.TState)
  (* from repr to type *)
  val rtype: r -> Typing.t
  (* from symbol to type *)
  val ntype: o -> Scoping.name -> Typing.t
  (* annotate symbol's type *)
  val ann: o -> Scoping.name -> Typing.t -> unit
  (* basic types *)
  val intt: Typing.t
  val strt: Typing.t
  val floatt: Typing.t
end


exception TypeError
module FSYMType(ST: STType) = struct
  include ST
  module TC = (val tc)
  open TC

  let letl : o -> string -> r -> r -> c = ...
  let lam: o -> string -> r -> c = ...
  let app: o -> r -> r -> c = ...
  let lit: o -> litype -> string -> c = ...
  let var: o -> string -> c  = ...
end

The whole code of this could be found at Lamu0_ast.ml #L62-L129.

For a rough sketch, let’s check the implementation lam again:

let lam o n e = lazy begin
    let eo = project e in
    let var_of_arg = new_tvar() in
    ann eo n var_of_arg;
    Typing.Arrow(var_of_arg, rtype e) end

Finally, we assembly things together, and make a type inferencer for Lamu0 at main.ml.

You can run the type infer REPL with: dune exec lamu0 --profile release:

let x = 1 in x;;
=> expr0 : ^int

let y =
   let f = fn x => fn y => y in
   let g = f 1 2.0 in
   f
in y;;
=> expr0 : ^int -> ^float -> ^float

let f = fn y => y "123" in f (fn x => x);;
=> expr0 : ^string