JIT编译器的深入介绍:JIT不太及时
If you are familiar with how JITs generally work (if you get what the title is referring to), I recommend skimming this or going straight to reading How JIT Compilers are Implemented and Fast: Julia, Pypy, LuaJIT, Graal and More
如果您熟悉JIT的一般工作原理(如果您理解标题的含义),我建议略读一下或直接阅读JIT编译器的实现和快速方式:Julia,Pypy,LuaJIT,Graal等
My mentor, Chris, who took me from “what is a JIT” to where I am now once told me that compilers were just bytes in bytes out and not at all low-level and scary. This is actually fairly true, and it's fun to learn about compiler internals and often useful for programmers everywhere!
我的导师克里斯(Chris)将我从“什么是JIT”带到了我现在曾经告诉我的地方,即编译器只是逐字节输出,而不是低级且令人恐惧。这实际上是相当正确的,学习编译器内部知识很有趣,并且通常对各地的程序员都有用!
This blog post gives background on how programming languages are implemented and how JITs work. It'll introduce the implementation details of the Julia language, though it won't talk about specific implementation details or optimizations made by more traditional JITs. Check out How JIT Compilers are Implemented and Fast: Julia, Pypy, LuaJIT, Graal and More to read about how meta-tracing is implemented, how Graal supports C extensions, the relationship of JITs with LLVM and more!
这篇博客文章提供了有关如何实现编程语言以及JIT如何工作的背景。它将介绍Julia语言的实现细节,尽管不会谈论特定的实现细节或更传统的JIT进行的优化。查看如何快速实现JIT编译器:Julia,Pypy,LuaJIT,Graal等,以了解如何实现元跟踪,Graal如何支持C扩展,JIT与LLVM的关系等等!
How Programming Languages are Implemented
如何实现编程语言
When we run a program, it’s either interpreted or compiled in some way. The compiler/interpreter is sometimes referred to as the "implementation" of a language, and one language can have many implementations. You may have heard things like "Python is interpreted", but that really means the reference(standard/default) implementation of Python is an interpreter. Python is a language specification and CPython is the interpreter and implementation of Python.
当我们运行程序时,它会以某种方式进行解释或编译。编译器/解释器有时被称为一种语言的“实现”,并且一种语言可以具有许多实现。您可能听说过“解释了Python”之类的事情,但这确实意味着Python的引用(标准/默认)实现是解释器。Python是一种语言规范,而CPython是Python的解释器和实现。
An interpreter is a program that directly executes your code. Well-known interpreters are usually written in C. Ruby, Python and PHP are written in C. Below is a function that loosely models how an interpreter might work:
解释器是直接执行您的代码的程序。众所周知的解释器通常用C编写。Ruby,Python和PHP用C编写。下面是一个松散地建模解释器工作方式的函数:
func interpret(code string) {
if code == "print('Hello, World!')" {
print("Hello, World");
} else if code == “x = 0; x += 4; print(x)” {
variable_x := 0
variable_x += 4
print(x)
}
}
A compiler is a program that translates code from some language to another language, though it usually refers to a destination language that is a machine code. Examples of compiled languages are C, Go and Rust.
编译器是一种将代码从某种语言转换为另一种语言的程序,尽管它通常指的是目标语言,即机器代码。编译语言的示例是C,Go和Rust。
func compile(code string) {
[]byte compiled_code = get_machine_code(code);
write_to_executable(compiled_code);
}
The difference between a compiled and interpreted language is actually much more nuanced. C, Go and Rust are clearly compiled, as they output a machine code file - which can be understood natively by the computer. The compile and run steps are fully distinct.
实际上,编译语言和解释语言之间的差异要细微得多。C,Go和Rust会被清晰地编译,因为它们会输出一个机器代码文件-计算机可以直接理解。编译和运行步骤完全不同。
However, compilers can translate to any target language (this is sometimes called transpiling). Java for example, has a two-step implementation. The first is compiling Java source to bytecode, which is an Intermediate Representation (IR). The bytecode is then JIT compiled - which involves interpretation.
但是,编译器可以翻译成任何目标语言(有时也称为翻译)。例如,Java具有两步实现。首先是将Java源代码编译为字节码,这是一个中间表示(IR)。然后,将字节码进行JIT编译-涉及解释。
Python and Ruby also execute in two steps. Despite being known as interpreted languages, their reference implementations actually compile the source down to a bytecode. You may have seen .pyc files (not anymore in Python3) which contain Python bytecode! The bytecode is then interpreted by a virtual machine. These interpreters use bytecode because programmers tend to care less about compile time, and creating a bytecode language allows the engineers to specify a bytecode that is as efficient to interpret as possible.
Python和Ruby也分两个步骤执行。尽管被称为解释语言,但它们的参考实现实际上将源代码编译为字节码。您可能已经看到了包含Python字节码的.pyc文件(在Python3中不再可用)!然后,字节码由虚拟机解释。这些解释器使用字节码,因为程序员倾向于减少对编译时间的关注,而创建字节码语言使工程师可以指定一种字节码,该字节码应尽可能有效地进行解释。
Having bytecode is how languages check syntax before execution (though they could technically just do a pass before starting the interpreter). An example below shows why you would want to check syntax before runtime.
拥有字节码是语言在执行之前检查语法的方式(尽管从技术上讲,它们可以在启动解释器之前进行一次传递)。下面的示例显示了为什么要在运行时之前检查语法。
sleep(1000)
bad syntax beep boop beep boop
Another important note is that interpreted languages are typically slower for various reasons, the most obvious being that they're executed in a higher level language that has overhead execution time. The main reason is that the dynamic-ness of the languages they tend to implement means that they need many extra instructions to decide what to do next and how to route data. People still choose to build interpreters over compilers because they're easier to build and are more suited to handle things like dynamic typing, scopes etc (though you could build a compiler that has the same features).
另一个重要的注意事项是,由于各种原因,解释型语言通常会变慢,最明显的是,解释型语言是在执行时间较长的高级语言中执行的。主要原因是他们倾向于实现语言的动态性,这意味着他们需要许多额外的指令来决定下一步该怎么做以及如何路由数据。人们仍然选择在编译器上构建解释器,因为它们更易于构建并且更适合处理动态类型,范围等(尽管您可以构建具有相同功能的编译器)。
So What is a JIT?
那么什么是JIT?
A JIT compiler doesn't compile code Ahead-Of-Time (AOT), but still compiles source code to machine code and therefore is not an interpreter. JITs compile code at runtime, while your program is executing. This gives the JITs flexibility for dynamic language features, while maintaining speed from optimized machine code output. JIT-compiling C would make it slower as we'd just be adding the compilation time to the execution time. JIT-compiling Python would be fast, as compilation + executing machine code can often be faster than interpreting, especially since the JIT has no need to write to a file (disk writing is expensive, memory/RAM/register writing is fast). JITs also improve in speed by being able to optimize on information that is only available at runtime.
JIT编译器不会提前编译代码(AOT),但仍会将源代码编译为机器代码,因此它不是解释器。当您的程序执行时,JIT在运行时编译代码。这为JIT提供了动态语言功能的灵活性,同时保持了优化的机器代码输出的速度。JIT编译C会使它变慢,因为我们只是将编译时间添加到执行时间。JIT编译Python将很快,因为编译+执行机器代码通常比解释更快,尤其是因为JIT无需写入文件(磁盘写入非常昂贵,内存/ RAM /寄存器写入非常快)。JIT还可以通过优化仅在运行时可用的信息来提高速度。
Julia: a JIT Compiler that's Just-in-time
朱莉娅:准时的JIT编译器
A common theme between compiled languages is that they're statically typed. That means when the programmer creates or uses a value, they’re telling the computer what type it is and that information is guaranteed at compile time.
编译语言之间的一个共同主题是它们是静态类型的。这意味着程序员在创建或使用值时,会告诉计算机它是什么类型,并且在编译时可以保证信息。
Julia is dynamically typed, but internally Julia is much closer to being statically typed.
Julia是动态键入的,但是内部Julia却更接近于静态键入。
function multiply(x, y)
x * y
end
Here is an example of a Julia function, which could be used to multiply integers, floats, vectors, strings etc (Julia allows operator overloading). Compiling out the machine code for all these cases is not very productive for a variety of reasons, which is what we'd have to do if we wanted Julia to be a compiled language. Idiomatic programming means that the function will probably only be used by a few combinations of types and we don't want to compile something that we don't use yet since that's not very jitty (this is not a real term).
这是一个Julia函数的示例,可用于乘以整数,浮点数,向量,字符串等(Julia允许运算符重载)。由于种种原因,在所有这些情况下编译机器代码的效率都不高,而如果我们希望Julia成为一种编译语言,这就是我们要做的。成语编程意味着该函数可能只会由几种类型的组合使用,并且我们不想编译一些我们不使用的东西,因为它不是很简单(这不是一个真正的术语)。
If I were to code multiply(1, 2), then Julia will compile a function that multiplies integers. If I then wrote multiply(2, 3), then the already-compiled code will be used. If I then added multiply(1.4, 4), another version of the function will be compiled. We can observe what the compilation does with @code_llvm multiply(1, 1), which generates LLVM Bitcode (not quite machine code, but a lower-level Intermediate Representation).
如果我要编写乘法(1、2),那么Julia将编译一个将整数相乘的函数。如果我然后编写了plicate(2,3),那么将使用已经编译的代码。如果再添加乘号(1.4,4),则会编译该函数的另一个版本。我们可以使用@code_llvmmultiple(1,1)观察编译的结果,生成LLVM位码(不是完全的机器代码,而是较低级别的中间表示)。
define i64 @julia_multiply_17232(i64, i64) {
top:
┌ @ int.jl:54 within `*'
mul i64 %1, %0 =
└
ret i64 %2
}
And with multiply(1.4, 4), you can see how complicated it can get to compile even one more function. In AOT compiled Julia, all (some optimizations can be made to reduce) of these combinations would have to live in the compiled code even if only one was used, along with the control flow to delegate.
并通过乘法(1.4,4),您可以看到编译一个以上的函数变得多么复杂。在AOT编译的Julia中,即使仅使用了一个组合,所有这些组合(可以进行一些优化以减少组合)都必须存在于已编译的代码中,以及要委派的控制流。
define double @julia_multiply_17042(double, i64) {
top:
; ┌ @ promotion.jl:312 within `*'
; │┌ @ promotion.jl:282 within `promote'
; ││┌ @ promotion.jl:259 within `_promote'
; │││┌ @ number.jl:7 within `convert'
; ││││┌ @ float.jl:60 within `Float64'
%2 = sitofp i64 %1 to double
; │└└└└
; │ @ promotion.jl:312 within `*' @ float.jl:405
%3 = fmul double %2, %0
; └
ret double %3
}
The general strategy of “assume a type and compile/behave based on that” is called type inferencing, which Julia mildly uses in the examples above. There are a lot of other compiler optimizations that are made, though none of them are very specific to JITs as Julia may be better described as a lazy AOT compiler.
“假定类型并以此为基础进行编译/行为”的一般策略称为类型推断,Julia在上面的示例中略微使用了这种类型推断。还有许多其他的编译器优化,尽管它们都不是专门针对JIT的,因为Julia可以更好地描述为惰性AOT编译器。
The simplicity of this kind of jitting makes it easy for Julia to also supply AOT compilation. It also helps Julia to benchmark very well, definitely a tier above languages like Python and comparable to C (I'd cite numbers, but those are always nuanced and I don't want to get into that).
这种拼凑的简单性使Julia可以轻松提供AOT编译。它还可以帮助Julia很好地进行基准测试,绝对比Python之类的语言高出一层并且可以与C媲美(我会引用数字,但这些数字总是细微差别,我不想涉足)。
So What is a JIT? Take Two.
那么什么是JIT?拿两个。
Julia is actually the jittiest JIT I'll discuss, but not the most interesting as a "JIT". It actually compiles code right before the code needs to be used -- just in time. Most JITs however (Pypy, Java, JS Engines), are not actually about compiling code just-in-time, but compiling optimal code at an optimal time. In some cases that time is actually never. In other cases, compilation occurs more than once. In a vast majority of the cases compilation doesn't occur until after the source code has been executed numerous times, and the JIT will stay in an interpreter as the overhead to compilation is too high to be valuable.
Julia实际上是我将要讨论的最严格的JIT,但并不是最有趣的“ JIT”。实际上,它可以在需要使用代码之前及时地编译代码。但是,大多数JIT(Pypy,Java,JS引擎)实际上并不是在即时编译代码,而是在最佳时间编译最佳代码。在某些情况下,时间实际上是永远不会。在其他情况下,编译会发生多次。在绝大多数情况下,直到源代码执行了无数次之后,编译才会发生,并且由于编译的开销太高而无济于事,因此JIT将留在解释器中。
The other aspect at play is generating optimal code. Assembly instructions are not created equal, and compilers will put a lot of effort into generating well-optimized machine code. Usually, it is possible for a human to write better assembly than a compiler (though it would take a fairly smart and knowledgeable human), because the compiler cannot dynamically analyze your code. By that, I mean things like knowing the possible range of your integers or what keys are in your map, as these are things that a computer could only know after (partially) executing your program. A JIT compiler can actually do those things because it interprets your code first and gathers data from the execution. Thus, JITs are expensive in that they interpret, and add compilation time to execution time, but they make it up in highly optimised compiled code. With that, the timing of compilation is also dependent on whether the JIT has gathered enough valuable information.
另一个方面是生成最佳代码。汇编指令的创建方式不尽相同,编译器将花费大量精力来生成经过优化的机器代码。通常,与编译器相比,人类可能编写出更好的程序集(尽管这将需要相当精明和知识丰富的人类),因为编译器无法动态分析您的代码。所谓的意思是知道整数的可能范围或映射中的键,因为这些是计算机(部分)执行程序后才知道的事情。JIT编译器实际上可以执行这些操作,因为它首先解释您的代码并从执行中收集数据。因此,JIT的解释是昂贵的,因为它们会解释并在执行时间中增加编译时间,但是它们由高度优化的编译代码组成。这样,编译的时间也取决于JIT是否收集了足够的有价值的信息。
The cool part about JITs is that I was sort of lying when I said a JIT implementation of C could not be faster than existing compiled implementations. It would not be feasible to try, but jit-compiling C in the way I just described is not a strict superset of compiling a language and thus it is not logically impossible to compile code fast enough to make up for the compile+profile+interpreting time. If I "JIT compiled" C similarly to how Julia does it (statically compile each function as it's called), it would be impossible to make it faster than compiled-C as the compile-time is non-negative and the generated machine code is essentially the same.
关于JIT的最酷的部分是,当我说C的JIT实现不可能比现有的编译实现快时,我有点说谎。尝试是不可行的,但是按照我刚才描述的方式进行Jit编译C并不是编译语言的严格超集,因此从逻辑上讲不可能足够快地编译代码以弥补compile + profile + interpreting的问题。时间。如果我以类似于Julia的方式“ JIT编译” C(静态地编译每个被调用的函数),由于编译时间为非负数且生成的机器代码为,因此不可能使其比C更快。基本相同。
Pogo
Pogo
Though jitting C is not feasible, one can find a middle ground through Profile Guided Optimization (PGO, cutely [and uncommonly] pronounced “pogo”). Instead of profiling while executing, you compile a program with PGO profiling, run that program and then recompile the original program with profiled data passed in. This is effective at reducing compiled-code size and improving branch prediction.
尽管将C设置为不可行,但是可以通过Profile Guided Optimization(PGO,可爱地(但很少见)发音为“ pogo”)找到中间立场。可以在执行过程中进行概要分析,而不是在执行过程中进行概要分析,而是运行该程序,然后运行该程序,然后使用传入的概要分析数据重新编译原始程序。这在减小编译代码大小和改进分支预测方面非常有效。
Warm it up
热身
JITs have a concept of warming up. Because intepretation and profiling time is expensive, JITs will start by executing a program slowly and then work towards "peak performance". For JITs with interpreted counterparts like Pypy, the JIT without warmup performs much worse at the beginning of execution due to the overhead of profiling. It's also the reason that JITs will consume signifcantly more memory.
JIT具有热身的概念。因为解释和分析时间很昂贵,所以JIT将以缓慢执行程序开始,然后朝着“峰值性能”迈进。对于具有诸如Pypy之类的已解释对等对象的JIT,由于分析的开销,没有预热的JIT在执行开始时的性能要差得多。这也是JIT消耗大量内存的原因。
Warmup adds complexity to measuring efficiency of a JIT! It's fine if you're measuring the performance of generating the mandelbrot set, but becomes painful if you're serving a web application and the first N requests are painfully slow. It’s complicated by the fact that the performance doesn’t strictly increase. If Pypy decides it needs to compile many things all at once after JITs compiling som functions, then you might have a slow-down in the middle. It also makes benchmark results more ambiguous, as you have to check if the jitted languages were given time to warmup, but you’d also want to know if it took an unseemly amount of time to warmup. Optimizing your compiled code and warmup speed is unfortunately zero-sum(or at least small-sum) by nature. If you try to get your code to compile sooner, less data will be available, the compiled code will not be as efficient and peak performance will be lower. Aiming for higher peak performance of course, often means higher profiling costs.
预热增加了测量JIT效率的复杂性!如果要测量生成mandelbrot集的性能很好,但是如果要提供Web应用程序并且前N个请求的速度很慢,则会很痛苦。效果并未严格提高,这使情况变得复杂。如果Pypy决定在JIT编译som函数之后需要一次全部编译很多东西,那么中间的速度可能会变慢。这也使基准测试结果更加含糊不清,因为您必须检查是否给固定语言指定了时间进行预热,但是您还想知道是否花费了很少的时间进行预热。不幸的是,优化编译后的代码和预热速度本质上是零和(或至少是小和)。如果您尝试使代码更快地进行编译,则可用的数据将更少,编译后的代码将不再高效,并且峰值性能会更低。当然,要获得更高的峰值性能,通常意味着更高的分析成本。
Java and Javascript engines are examples of JITs that have put really good care into warmup time, but you may find that languages built for academic uses have monstrous warmup times in favour of snazzy peak performances.
Java和Javascript引擎是JIT的示例,它们确实对预热时间进行了很好的照顾,但是您可能会发现,为学术用途而构建的语言具有极大的预热时间,从而有利于表现出色。