================================================================= Some Thoughts About The Restrain JIT ================================================================= Yesterday is the PyConChina 2019, and I made a talk about the JIT stuffs. Actually, about my recent project, the `Restrain Python JIT`_ . In this project I did a backend-independent IR generation from the existing Python bytecode instructions, and made an instance back end targeting the Julia Language. The instructions passed to Julia can be described with following specification: **P.S** : this instructions are only for Julia back end, but the process of instruction generation will work for all prospective back ends due to an emulation of Tagless Final Style in Python: - `from_bc.py`_ - `run_machine`_ :: data A lhs:typing.Optional[str] rhs:'Instr'; abc Instr; data App(Instr) f:Repr args:typing.List[Repr]; data Ass(Instr) reg:Reg val:Repr; data Load(Instr) reg:Reg; data Store(Instr) reg:Reg val:Repr; data JmpIf(Instr) label:str cond:Repr; data JmpIfPush(Instr) label:str cond:Repr leave:Repr; data Jmp(Instr) label:str; data Label(Instr) label:str; data Peek(Instr) offset:int; data Return(Instr) val:Repr; data Push(Instr) val:Repr; data Pop(Instr) ; data PyGlob(Instr) qual:str name:str; data JlGlob(Instr) qual:str name:str; data UnwindBlock(Instr) instrs:typing.List[A]; data PopException(Instr) must:bool; Above code, explicitly, is for generating Python dataclasses, and for instance, :: abc Instr; corresponds to .. code-block:: Python class Instr(abc.ABC): pass and, :: data App(Instr) f:Repr args:typing.List[Repr]; generates .. code-block:: Python @dataclass class App(Instr): f:Repr args:typing.List[Repr] where :code:`Repr` is short for **representation** : :: abc Repr; data Reg(Repr) n:str; data Const(Repr) val:object; It's indeed a small set of instructions, and it's a mixture of stack machine instructions and register machine instructions. The reason why I did this is, Python bytecode instructions are based on a stack machine and I cannot trivially eliminate the Python's :code:`Push` (e.g., :code:`LOAD_FAST` , :code:`LOAD_DEREF` ) and :code:`Pop` (e.g., :code:`POP_TOP` ) instructions without writing a pass to analyze the control flows(i.e, the jump instructions by for-loops). After generating the instructions for Julia back end, I use an existing Python-Julia bridging library to send the those instructions to Julia, and Julia converts them to the regulr Julia objects, actually, in the form of `algebraic data types`_: .. code-block:: Julia @data Instr begin Value(v::Any) App(f:: Repr, args:: Vec{Repr}) Ass(reg::Reg, val::Repr) Load(reg::Reg) Store(reg::Reg, val::Repr) JmpIf(label::Symbol, cond::Repr) JmpIfPush(label::Symbol, cond::Repr, leave::Repr) Jmp(label::Symbol) Label(label::Symbol) Peek(offset::Int) Return(val::Repr) Push(val::Repr) Pop() PyGlob(sym::Symbol) JlGlob(qual::Union{Nothing, Symbol}, name::Symbol) UnwindBlock(instrs::Vec{<:AbsA}) PopException(must::Bool) end Then codegen from the instructions. However, Julia cannot eval the generated codes directly, otherwise will suffer from the `"World Age Problem"`_ . To end this, I introduced my `GG.jl`_ in the JIT back end implementation, and finally the whole system works. However, there're still performance problems. The Python jump instructions will finally result into some goto statements(via :code:`@goto` and `@label`) in Julia, and I said before that I haven't eliminated push-pop instructions, which means I have to introduce a stack in the generated code. And it gets severe when handling Python :code:`try` statements, due to the some reason I have to introduce another stack for storing exceptions. Above cases make it slow for Python's :code:`try` and :code:`for`. Although there're easy workarounds( :code:`monad` s instead of :code:`try` statements, and the :code:`foreach` function instead of :code:`for` statements) to avoid suffering the performance loss, the compile time overhead is another disaster. So, I came up with an idea of transforming the generated instructions to fast Julia functions, and I guess it might be a bit faster to compile than using my `GG.jl`_ . I just posted the idea here today, though without an implementation yet. Instead of converting generated instructions of Python objects to Julia instances, we can also translate them to Julia types, and then we write an interpreter-like function in Julia to interpret the types, and the fast performance of Julia's generated functions may help this case, and even do better to eliminate my stack encoding. Of course, it's just a guess now. I tried to finish it today, but it seems still need more time. I'll use a pure stack machine instructions to show the design: 1. make some types for representing instructions .. code-block:: Julia abstract type S end struct S0 <: S end struct S1{T} <: S end struct S2{T, G} <: S end struct App{N} end struct Push{Val} end struct Pop end struct Ass{Sym} end struct Label{L} end struct Jmp{L} end struct JmpIf{L} end struct Return end 2. based on the types, I write an interpreter-like generated function: .. code-block:: Julia function interp(_, ::Type{S2{Return, Tail}}, __stack__::Tuple{Hd, Tl}, _, _)::Hd where {Tail, Hd, Tl} stack[1] end pop_n(x, n) = n === 0 ? (Expr[], x) : begin vs, tl = pop_n(:($x[1]), n-1) push!(vs, :($x[0])) vs, tl end @generated function interp( past_instrs::Type{Past}, left_instrs::Type{S2{App{N}, Follow}}, __stack__, ::Type{S1{LocalNames}}, localvars... ) where {Past, N, LocalNames, Follow} n = hd.parameters[1] elts, tl = pop_n(:__stack__, N) f = elts[1] args = view(elts, 2:n+1) quote __stack__ = ($f($(args...)), $tl) interp(S2{App{N}, Past}, Follow, __stack__, S1{Names}, localvars...) end end Some explanations: - The argument :code:`past_instrs` for prospective jumping back. To implement loops, jumping back is needed. - The argument :code:`localvars` is all local variables could be accessed in the function. - The type parameter :code:`LocalNames` is for statically looking up local variables from its names. I'm to implement it in the future, to examine if this way will keep performance and reduce the JIT overhead. If not, the Julia back end for the Restrain Python JIT might be useless(unless for computations in really big scales). If Julia cannot be a good JIT back end for CPython, I'll try JVM. JVM is a stack machine and it's not that slow to boot up. Further, the .class files can be loaded dynamically and also with JIT support, and my friends who know much about Java and JVM told me JVM is also good for optimizing the dynamically typed codes. However, if I do use JVM as a back end in the future, I think keeping CPython's C extensions working will be difficult. Actually I'm not sure, because I didn't know much about JVM, but if it's the truth, I'll give up trying JVM. Keeping `CPython Compatible`_ is the most important, in my opinion. .. _from_bc.py: https://github.com/thautwarm/restrain-jit/blob/8c158693e4548da799bdcab2858ca2b64e1f0521/restrain_jit/abs_compiler/from_bc.py .. _run_machine: https://github.com/thautwarm/restrain-jit/blob/8c158693e4548da799bdcab2858ca2b64e1f0521/restrain_jit/vm/am.py#L267 .. _Restrain Python JIT: https://github.com/thautwarm/restrain-jit/ .. _algebraic data types: https://github.com/thautwarm/RestrainJIT.jl/blob/cf7c1d7c0b6eb517db46ae1b2339b560e2b0a2a1/src/instr_repr.jl .. _"World Age Problem": https://discourse.julialang.org/t/world-age-problem-explanation/9714 .. _GG.jl: https://github.com/thautwarm/GeneralizedGenerated.jl .. _CPython Compatible: https://github.com/thautwarm/restrain-jit/blob/master/docs/What-is-CPython-Compatible.md