Some Thoughts About The Restrain JIT¶

Yesterday is the PyConChina 2019, and I made a talk about the JIT stuffs. Actually, about my recent project, the Restrain Python JIT .

In this project I did a backend-independent IR generation from the existing Python bytecode instructions, and made an instance back end targeting the Julia Language.

The instructions passed to Julia can be described with following specification:

P.S : this instructions are only for Julia back end, but the process of instruction generation will work for all prospective back ends due to an emulation of Tagless Final Style in Python:

data A lhs:typing.Optional[str] rhs:'Instr';

abc Instr;
data App(Instr) f:Repr args:typing.List[Repr];
data Ass(Instr) reg:Reg val:Repr;
data Load(Instr) reg:Reg;
data Store(Instr) reg:Reg val:Repr;
data JmpIf(Instr) label:str cond:Repr;
data JmpIfPush(Instr) label:str cond:Repr leave:Repr;
data Jmp(Instr) label:str;
data Label(Instr) label:str;
data Peek(Instr) offset:int;
data Return(Instr) val:Repr;
data Push(Instr) val:Repr;
data Pop(Instr) ;
data PyGlob(Instr) qual:str name:str;
data JlGlob(Instr) qual:str name:str;
data UnwindBlock(Instr) instrs:typing.List[A];
data PopException(Instr) must:bool;

Above code, explicitly, is for generating Python dataclasses, and for instance,

abc Instr;

corresponds to

class Instr(abc.ABC): pass

and,

data App(Instr) f:Repr args:typing.List[Repr];

generates

@dataclass
class App(Instr):
    f:Repr
    args:typing.List[Repr]

where Repr is short for representation :

abc Repr;
data Reg(Repr) n:str;
data Const(Repr) val:object;

It’s indeed a small set of instructions, and it’s a mixture of stack machine instructions and register machine instructions.

The reason why I did this is, Python bytecode instructions are based on a stack machine and I cannot trivially eliminate the Python’s Push (e.g., LOAD_FAST , LOAD_DEREF ) and Pop (e.g., POP_TOP ) instructions without writing a pass to analyze the control flows(i.e, the jump instructions by for-loops).

After generating the instructions for Julia back end, I use an existing Python-Julia bridging library to send the those instructions to Julia, and Julia converts them to the regulr Julia objects, actually, in the form of algebraic data types:

@data Instr begin
    Value(v::Any)
    App(f:: Repr, args:: Vec{Repr})
    Ass(reg::Reg, val::Repr)
    Load(reg::Reg)
    Store(reg::Reg, val::Repr)
    JmpIf(label::Symbol, cond::Repr)
    JmpIfPush(label::Symbol, cond::Repr, leave::Repr)
    Jmp(label::Symbol)
    Label(label::Symbol)
    Peek(offset::Int)
    Return(val::Repr)
    Push(val::Repr)
    Pop()
    PyGlob(sym::Symbol)
    JlGlob(qual::Union{Nothing, Symbol}, name::Symbol)
    UnwindBlock(instrs::Vec{<:AbsA})
    PopException(must::Bool)
end

Then codegen from the instructions.

However, Julia cannot eval the generated codes directly, otherwise will suffer from the “World Age Problem” .

To end this, I introduced my GG.jl in the JIT back end implementation, and finally the whole system works.

However, there’re still performance problems.

The Python jump instructions will finally result into some goto statements(via @goto and @label) in Julia, and I said before that I haven’t eliminated push-pop instructions, which means I have to introduce a stack in the generated code. And it gets severe when handling Python try statements, due to the some reason I have to introduce another stack for storing exceptions.

Above cases make it slow for Python’s try and for. Although there’re easy workarounds( monad s instead of try statements, and the foreach function instead of for statements) to avoid suffering the performance loss, the compile time overhead is another disaster.

So, I came up with an idea of transforming the generated instructions to fast Julia functions, and I guess it might be a bit faster to compile than using my GG.jl .

I just posted the idea here today, though without an implementation yet.

Instead of converting generated instructions of Python objects to Julia instances, we can also translate them to Julia types, and then we write an interpreter-like function in Julia to interpret the types, and the fast performance of Julia’s generated functions may help this case, and even do better to eliminate my stack encoding. Of course, it’s just a guess now. I tried to finish it today, but it seems still need more time.

I’ll use a pure stack machine instructions to show the design:

make some types for representing instructions

abstract type S end
struct S0 <: S end
struct S1{T} <: S end
struct S2{T, G} <: S end

struct App{N} end
struct Push{Val} end
struct Pop end
struct Ass{Sym} end
struct Label{L} end
struct Jmp{L} end
struct JmpIf{L} end
struct Return end

based on the types, I write an interpreter-like generated function:

function interp(_, ::Type{S2{Return, Tail}}, __stack__::Tuple{Hd, Tl}, _, _)::Hd where {Tail, Hd, Tl}
    stack[1]
end

pop_n(x, n) = n === 0 ? (Expr[], x) :
    begin vs, tl = pop_n(:($x[1]), n-1)
        push!(vs, :($x[0]))
        vs, tl
    end

@generated function interp(
    past_instrs::Type{Past},
    left_instrs::Type{S2{App{N}, Follow}},
    __stack__,
    ::Type{S1{LocalNames}},
    localvars...
) where {Past, N, LocalNames, Follow}
    n = hd.parameters[1]
    elts, tl = pop_n(:__stack__, N)
    f = elts[1]
    args = view(elts, 2:n+1)

    quote
        __stack__ = ($f($(args...)), $tl)
        interp(S2{App{N}, Past}, Follow, __stack__, S1{Names}, localvars...)
    end
end

Some explanations:

The argument past_instrs for prospective jumping back. To implement loops, jumping back is needed.
The argument localvars is all local variables could be accessed in the function.
The type parameter LocalNames is for statically looking up local variables from its names.

I’m to implement it in the future, to examine if this way will keep performance and reduce the JIT overhead.

If not, the Julia back end for the Restrain Python JIT might be useless(unless for computations in really big scales).

If Julia cannot be a good JIT back end for CPython, I’ll try JVM. JVM is a stack machine and it’s not that slow to boot up. Further, the .class files can be loaded dynamically and also with JIT support, and my friends who know much about Java and JVM told me JVM is also good for optimizing the dynamically typed codes.

However, if I do use JVM as a back end in the future, I think keeping CPython’s C extensions working will be difficult.

Actually I’m not sure, because I didn’t know much about JVM, but if it’s the truth, I’ll give up trying JVM.

Keeping CPython Compatible is the most important, in my opinion.