NOTE: If you'd install `remu_ts` and `remu_scope`, clone them from [RemuLang](https://github.com/RemuLang/) and use `opam install .` to install them locally. # FSYM: An Abstraction On Tagless-Final Style To Compositing And Decoupling Multiple Interpretations `Lamu0` is a very simple and basic programming language, but the implementation in [plfp/lamu0](https://github.com/thautwarm/plfp/tree/master/lamu0) shows an approach to use Tagless Final to - handle the case that the order of interpretations is significant. - allow the composition of decoupled compiler phases Given the grammar ```ocaml type litype = IntT | FloatT | StringT module type SYM = sig type repr val letl : string -> repr -> repr -> repr val lam : string -> repr -> repr val app : repr -> repr -> repr val lit : litype -> string -> repr val var : string -> repr end ``` ## Terminology The following terms represent the same thing in this presentation. - interpretation (from a separate view of compiler) - `SYM` (from the view of routine implementation in OCaml) - algebra (from a view of mathematics) - compiler phase(from an overall view from compiler) ## Background The steps through `type repr` to the executable low level instructions are complex: Usually, a phase `A` should be performed prior to some phase `B`, so `B` depends on `A`. For instance, if the last phase is back end code generation, it for sure depends on all of other phases. ``` A -> B -> ... -> C \ \ | .. .. | \ \|/ CodeGen for each phase, it's either prior to phase CodeGen or phase CodeGen itself. ``` Another intuituve example, to perform type checking/inference, we shall understand the scope information firstly. ``` --------------------- | | let x = 1 | identify the occurrences of let y = 2 | a type variable in y + x-------------| in x--------------------- ``` More instances could be found in the real world: - All type-directed code generations require the information from (partially) type inference. - The compilation for type classes, in the approach of dictionary passing, requires type information, and a scope for class instances. - The closure conversion needs scope information to get the free variables of a function. - etc... The tagless final approach works for the polymorphisms of interpreting a given "grammar", however, it lacks of the facilities to work with multuple separate interpretations, like - **compositing separate interpretations** - **resolving the dependency relationships among dependent interpretations** - **decoupling the dependent interpretations as much as possible** To address these problems, I proposed - an operation(`grow`) among the algebras to make reductions, - a module signature(`FSYM`) very close to the shape of `SYM` to composite and decouple interpretations. which could be taken advantage of to solve above 3 pain spots fairly easy. ## Introduction The core idea is the generalization of `fold` operation, and I call it `grow`. `grow (grow (grow zero fst) snd) third`, where `zero`, `grow (zero, fst)`, and similar stuffs are algebras(`module SYM`), however, all of them have distinct representations(usually written as `type repr` in a `SYM`). In this design, for `grow(m, mexpander)`, where `m` is an algebra, and `mexpander` can produce a new algebra(e.g., `scope->scope+type`) by composing with `m` via `grow`. Something occurs naturally and simultaneously during expanding the algebra is, the expansion of the representation type(`type repr` in an algebra). ```ocaml A : module SYM with type repr = o (* base *) B : module SYM with type repr = r (* result *) ``` We check the "delta" from `o = A.repr` to `r = B.repr`, and represent it by type `c`, in other words: **`grow` transforms the type `repr` in the algebra(`SYM`) from `o` to `r`, with a delta `c`, while transforming the interpretation from `A` to `A+B`.** We extract the type expanding function out: ```ocaml val combine : o -> c -> r ``` And extract the interpretation expanding function out: ```ocaml val grow: 'o 'c 'r. (module SYM with type repr = 'o) -> (module FSYM with type o ='o and type c = 'c and type r = 'r) -> (module SYM with type repr = 'r) ``` So the current goal is to extract the structure of `FSYM`, and make sure it satisfy our final goals: - compositing separate interpretations: This is already explicit and natural, according the type of `grow` function. - resolving the dependency relationships among dependent interpretations: We're supposed to make sure `FSYM` can use the interpreted result of `module SYM with type repr = o` to implement the interpretion `module SYM with type repr=r`. - decoupling the dependent interpretations as much as possible: We're supposed to decouple the implementation of `module SYM with type repr = o` and `FSYM`. That is to say, we don't have to, and even shouldn't know or aware the implementation of `module SYM with type repr = o` when we're impelementing `FSYM`. We point out that, the following signature would suffice ```ocaml module type FSYM = sig type r type c (*delta*) type o val combine : o -> c -> r val project : r -> o val letl : o -> string -> r -> r -> c val lam : o -> string -> r -> c val app : o -> r -> r -> c val lit : o -> litype -> string -> c val var : o -> string -> c end ``` Deriving an `FSYM` from `SYM` is trivial: Besides the common part ```ocaml sig type o type c (* delta from o to c *) type r val combine : o -> c -> r val project : r -> o end ``` Any operator of type `a -> b -> ... -> r` in `SYM`, is supposed to be transformed to `o -> a -> b -> ... -> c` in `FSYM`, where the symbols `o, c, r` keep the same meanings as the aforementioned: - `o` is the original `repr` of the original algebra `A`, - `r` is the `repr` of the result algebra(`A+B`) transformed by `grow(A, B)`, - `c` is the delta of the change from `o` to `r`. There's no algebra like `SYM`, but exactly a functor `A->A+B`. To elaborate, we can use `lam` operator as an examplar: - in `SYM`/algebra/interpretation `A`: ```ocaml val lam_1: string -> o -> o ``` - in **`FSYM`**, `A->A+B`: ```ocaml val lam_2: o -> string -> r -> c ``` - in `SYM`/algebra/interpretation `A+B` ```ocaml val lam_3: string -> r -> r ``` We now need to implement `lam_3` via `lam_1` and `lam_2`. Recall the last 2 of our goals which haven't been accomplished: - decoupling the dependent interpretations as much as possible: Which is to say, `lam_2` and `lam_3` shouldn't aware how `lam_1` gets implemented, and it's easy to satisfy: ```ocaml let lam_3 (argname: string) (body: r) = let body_o: o = project body in let o = lam_1 argname body_o in (* HIGHLIGHTING HERE! *) ... ``` Of course, `lam_1`, and anything else for implementing the prior interpretation(a.k.a `A`), shouldn't be referred in `lam_2`'s implementation, or how `lam_3` uses `lam_2`. - resolving the dependency relationships among dependent interpretations: Hence, `lam_2` should use the result of the prior interpretation(a.k.a `A`), and it's quite easy as well: ```ocaml let lam_3 (argname: string) (body: r) = let body_o: o = project body in let o = lam_1 argname body_o in let c = lam_2 o argname body in (* HIGHLIGHTING HERE! *) combine o c ``` The whole code for `grow` can be found at [final.ml L28-L59](https://github.com/thautwarm/plfp/blob/2745e4791ac2b6ea9102515b9d2cf8d375de4660/lamu0/lib/final.ml#L28-L59), but notice that the type `repr` in `SYM` is written in a shorter form `r`. \* In fact, if we use lazy types as the `repr` of each interpretation/phase, the order of interpretation can be more flexible. Check `Lamu0` in the sub-section `Application`. ## Application [Lamu0](https://github.com/thautwarm/plfp/tree/master/lamu0) gives a very simple example to compose the existing and decoupled frameworks for compilers. ### Scoping: Name Resolution An existing simple framework, [remu_scope](https://github.com/RemuLang/remu-scope), designed for name resolution, also written by me, provides following 3 major APIs: ```ocaml val require: env -> scoperef -> name -> sym val enter: env -> scoperef -> name -> sym val subscope: env -> scoperef -> scoperef ``` For example, to solve the scope of following code: ```ocaml let x = 1 in x ``` We can do: ```ocaml let env = empty_env() in let root: scoperef = 0 in let let_scope = subscope env root in let x_assign = enter let_scope "x" in let x_load = require let_scope "x" ``` With this snippet, you can check `assert (x_assign = x_load)`. With tagless final extended by `FSYM` abstraction and above existing framework, we can then implement a standalone but composable interpretation for name resolution: ```ocaml module Scoping = Remu_scope.Solve type scopedesc = | Sym of Scoping.sym | ScopeUnrelated (* for expressions that're not variables *) type scopeinfo = {desc: scopedesc; i: Scoping.scoperef} module type STScope = sig type o type c = scopeinfo Lazy.t type r val env : Scoping.env val cur_scoperef : Scoping.scoperef ref val combine: o -> c -> r val project: r -> o val get: r -> scopeinfo end module FSYMScope(ST : STScope) = struct include ST let letl : o -> string -> r -> r -> c = ... let lam: o -> string -> r -> c = ... let app: o -> r -> r -> c = ... let lit: o -> litype -> string -> c = ... let var: o -> string -> c = ... end ``` The whole code can be found at [lamu0_ast.ml L5-L60](https://github.com/thautwarm/plfp/blob/2745e4791ac2b6ea9102515b9d2cf8d375de4660/lamu0/lib/lamu0_ast.ml#L5-L60). We unroll the implementation of `lam`: ```ocaml (* let subscope () = Scoping.subscope ST.env (!ST.cur_scoperef) let enter n = Scoping.enter ST.env (!ST.cur_scoperef) n let with_scope si' f = let si = !ST.cur_scoperef in ST.cur_scoperef := si'; let ret = f() in ST.cur_scoperef := si; {desc=ret; i=si} *) let lam: o -> string -> r -> c = fun _ n e -> lazy begin let si' = subscope () in with_scope si' @@ fun () -> let _ = enter n in let _ = get e in ScopeUnrelated end ``` It's pretty easy, and can be composed into the compilation pipeline, for every programming language whose scope could be expressed by `remu_scope`. ### Typing: Type Inference Type inference requires already knowing the scope information. So it depends on the previous phase, name resolution. Firstly we check an existing framework providing type inference, [remu_ts](https://github.com/RemuLang/remu-type-system). And we just use a very limited part of `remu_ts`, here's an example of this framework: To infer the types of code, ```ocaml val f : forall a. 'a -> 'a -> bool let x = 1 in f x y ``` We write ```ocaml open Remu_ts.Infer open Remu_ts.Comm open Remu_ts.Builder module TC : TState = (val crate_tc empty_tctx : TState) let _ = let open TC in let intt = new_type "int" in let boolt = new_type "bool" in let x = new_tvar() in let y = new_tvar() in let f = Forall(["a"], Arrow(Fresh "a", Arrow(Fresh "a", boolt))) in (* x = 1 *) assert (unify x intt); (* f x y *) let arg1 = new_tvar() in let arg2 = new_tvar() in assert (unify arg1 x); assert (unify arg2 y); let func = Arrow(arg1, Arrow(arg2, boolt)) in assert (unify f func); let print_ty name x = Printf.printf "%s: %s\n" name @@ dumpstr (mk_show_named_nom (module TC)) @@ prune x in print_ty "x" x; print_ty "y" y; print_ty "func" func ``` After running this file, we got ``` x: ^int y: ^int func: ^int -> ^int -> ^bool ``` The implementation of `FSYM` to leverage above existing framework is: ```ocaml module Typing = Remu_ts.Infer module type STType = sig type o type c = Typing.t Lazy.t type r val combine: o -> c -> r val project: r -> o (* type checking states *) val tc: (module Typing.TState) (* from repr to type *) val rtype: r -> Typing.t (* from symbol to type *) val ntype: o -> Scoping.name -> Typing.t (* annotate symbol's type *) val ann: o -> Scoping.name -> Typing.t -> unit (* basic types *) val intt: Typing.t val strt: Typing.t val floatt: Typing.t end exception TypeError module FSYMType(ST: STType) = struct include ST module TC = (val tc) open TC let letl : o -> string -> r -> r -> c = ... let lam: o -> string -> r -> c = ... let app: o -> r -> r -> c = ... let lit: o -> litype -> string -> c = ... let var: o -> string -> c = ... end ``` The whole code of this could be found at [Lamu0_ast.ml #L62-L129](https://github.com/thautwarm/plfp/blob/2745e4791ac2b6ea9102515b9d2cf8d375de4660/lamu0/lib/lamu0_ast.ml#L62). For a rough sketch, let's check the implementation `lam` again: ```ocaml let lam o n e = lazy begin let eo = project e in let var_of_arg = new_tvar() in ann eo n var_of_arg; Typing.Arrow(var_of_arg, rtype e) end ``` Finally, we assembly things together, and make a type inferencer for `Lamu0` at [main.ml](https://github.com/thautwarm/plfp/blob/master/lamu0/bin/main.ml). You can run the type infer REPL with: `dune exec lamu0 --profile release`: ``` let x = 1 in x;; => expr0 : ^int let y = let f = fn x => fn y => y in let g = f 1 2.0 in f in y;; => expr0 : ^int -> ^float -> ^float let f = fn y => y "123" in f (fn x => x);; => expr0 : ^string ``` ## References - [Algebra](http://okmij.org/ftp/tagless-final/Algebra.html) - [Tagless-Final Optimizations, Algebraically and Semantically](http://okmij.org/ftp/tagless-final/course2/)