Some Thoughts About The Restrain JIT¶
Yesterday is the PyConChina 2019, and I made a talk about the JIT stuffs. Actually, about my recent project, the Restrain Python JIT .
In this project I did a backend-independent IR generation from the existing Python bytecode instructions, and made an instance back end targeting the Julia Language.
The instructions passed to Julia can be described with following specification:
P.S : this instructions are only for Julia back end, but the process of instruction generation will work for all prospective back ends due to an emulation of Tagless Final Style in Python:
data A lhs:typing.Optional[str] rhs:'Instr';
abc Instr;
data App(Instr) f:Repr args:typing.List[Repr];
data Ass(Instr) reg:Reg val:Repr;
data Load(Instr) reg:Reg;
data Store(Instr) reg:Reg val:Repr;
data JmpIf(Instr) label:str cond:Repr;
data JmpIfPush(Instr) label:str cond:Repr leave:Repr;
data Jmp(Instr) label:str;
data Label(Instr) label:str;
data Peek(Instr) offset:int;
data Return(Instr) val:Repr;
data Push(Instr) val:Repr;
data Pop(Instr) ;
data PyGlob(Instr) qual:str name:str;
data JlGlob(Instr) qual:str name:str;
data UnwindBlock(Instr) instrs:typing.List[A];
data PopException(Instr) must:bool;
Above code, explicitly, is for generating Python dataclasses, and for instance,
abc Instr;
corresponds to
class Instr(abc.ABC): pass
and,
data App(Instr) f:Repr args:typing.List[Repr];
generates
@dataclass
class App(Instr):
f:Repr
args:typing.List[Repr]
where Repr
is short for representation :
abc Repr;
data Reg(Repr) n:str;
data Const(Repr) val:object;
It’s indeed a small set of instructions, and it’s a mixture of stack machine instructions and register machine instructions.
The reason why I did this is, Python bytecode instructions are
based on a stack machine and I cannot trivially eliminate the Python’s Push
(e.g.,
LOAD_FAST
, LOAD_DEREF
)
and Pop
(e.g., POP_TOP
) instructions without writing a pass to
analyze the control flows(i.e, the jump instructions by for-loops).
After generating the instructions for Julia back end, I use an existing Python-Julia bridging library to send the those instructions to Julia, and Julia converts them to the regulr Julia objects, actually, in the form of algebraic data types:
@data Instr begin
Value(v::Any)
App(f:: Repr, args:: Vec{Repr})
Ass(reg::Reg, val::Repr)
Load(reg::Reg)
Store(reg::Reg, val::Repr)
JmpIf(label::Symbol, cond::Repr)
JmpIfPush(label::Symbol, cond::Repr, leave::Repr)
Jmp(label::Symbol)
Label(label::Symbol)
Peek(offset::Int)
Return(val::Repr)
Push(val::Repr)
Pop()
PyGlob(sym::Symbol)
JlGlob(qual::Union{Nothing, Symbol}, name::Symbol)
UnwindBlock(instrs::Vec{<:AbsA})
PopException(must::Bool)
end
Then codegen from the instructions.
However, Julia cannot eval the generated codes directly, otherwise will suffer from the “World Age Problem” .
To end this, I introduced my GG.jl in the JIT back end implementation, and finally the whole system works.
However, there’re still performance problems.
The Python jump instructions will finally result into
some goto statements(via @goto
and @label) in Julia, and I said before that I haven’t eliminated
push-pop instructions, which means I have to introduce a stack in the generated code. And it gets severe
when handling Python try
statements, due to the some reason I have to introduce another stack
for storing exceptions.
Above cases make it slow for Python’s try
and for
. Although there’re easy workarounds( monad
s instead
of try
statements, and the foreach
function instead of for
statements) to avoid
suffering the performance loss, the compile time overhead is another disaster.
So, I came up with an idea of transforming the generated instructions to fast Julia functions, and I guess it might be a bit faster to compile than using my GG.jl .
I just posted the idea here today, though without an implementation yet.
Instead of converting generated instructions of Python objects to Julia instances, we can also translate them to Julia types, and then we write an interpreter-like function in Julia to interpret the types, and the fast performance of Julia’s generated functions may help this case, and even do better to eliminate my stack encoding. Of course, it’s just a guess now. I tried to finish it today, but it seems still need more time.
I’ll use a pure stack machine instructions to show the design:
- make some types for representing instructions
abstract type S end
struct S0 <: S end
struct S1{T} <: S end
struct S2{T, G} <: S end
struct App{N} end
struct Push{Val} end
struct Pop end
struct Ass{Sym} end
struct Label{L} end
struct Jmp{L} end
struct JmpIf{L} end
struct Return end
- based on the types, I write an interpreter-like generated function:
function interp(_, ::Type{S2{Return, Tail}}, __stack__::Tuple{Hd, Tl}, _, _)::Hd where {Tail, Hd, Tl}
stack[1]
end
pop_n(x, n) = n === 0 ? (Expr[], x) :
begin vs, tl = pop_n(:($x[1]), n-1)
push!(vs, :($x[0]))
vs, tl
end
@generated function interp(
past_instrs::Type{Past},
left_instrs::Type{S2{App{N}, Follow}},
__stack__,
::Type{S1{LocalNames}},
localvars...
) where {Past, N, LocalNames, Follow}
n = hd.parameters[1]
elts, tl = pop_n(:__stack__, N)
f = elts[1]
args = view(elts, 2:n+1)
quote
__stack__ = ($f($(args...)), $tl)
interp(S2{App{N}, Past}, Follow, __stack__, S1{Names}, localvars...)
end
end
Some explanations:
- The argument
past_instrs
for prospective jumping back. To implement loops, jumping back is needed. - The argument
localvars
is all local variables could be accessed in the function. - The type parameter
LocalNames
is for statically looking up local variables from its names.
I’m to implement it in the future, to examine if this way will keep performance and reduce the JIT overhead.
If not, the Julia back end for the Restrain Python JIT might be useless(unless for computations in really big scales).
If Julia cannot be a good JIT back end for CPython, I’ll try JVM. JVM is a stack machine and it’s not that slow to boot up. Further, the .class files can be loaded dynamically and also with JIT support, and my friends who know much about Java and JVM told me JVM is also good for optimizing the dynamically typed codes.
However, if I do use JVM as a back end in the future, I think keeping CPython’s C extensions working will be difficult.
Actually I’m not sure, because I didn’t know much about JVM, but if it’s the truth, I’ll give up trying JVM.
Keeping CPython Compatible is the most important, in my opinion.