5 Virtual Machines

How pharo works internally

Note: we refer to pharo, but this applies also to squeak, because the mechanisms are the same. Even their VMs are 100% compatible when not exactly the same.

When you run an instance of pharo, you are actually running a virtual machine, with the specified image passed as an argument. Pharo is made of a VM in the sense that its executable consists of an interpreter, which contains its own instruction set (called its bytecode) to which all smalltalk methods are compiled. When a smalltalk method needs to be executed, the virtual machine looks up the method's bytecodes and interpret them one by one. You can see the bytecodes within the image, try exploring

SmallInteger >> #timesRepeat:

this will show you the instance of CompiledMethod that gets executed when you do something like 20 timesRepeat: aBlock. You can even see the bytecodes there. The term Virtual Machine has been given diverse meanings, as you can see here, so when talking about pharo's virtual machine we may use a term taken from A Tour of the Squeak Object Engine, which is Object Engine (OE) that fits much better and is more appropriate.

But, where is Pharo's OE source?

The OE you execute each time you run Pharo is programmed half in Smalltalk and half in C. It would be nice to have all of it written in Smalltalk, but for many reasons that's not possible nor desirable today. A not so small amount of the code consists of platform specific support files, so it's easier/better to have them written directly in C. We call this code the handwritten C part (HC). The not-C part of the OE is all written in Smalltalk, and to be more precise it is written in Slang, which is a subset of Smalltalk that can be easily translated to C. We call this the Slang code (SC). This SC is later translated to C by VMMaker and used in conjuntion with HC to make the OE. We call it the automatically generated C code (GenC).


The Slang code is inside VMMaker package, which is explained in the next section. The HC code is hosted at www.squeakvm.org, where you can also get a copy of specific versions of the GenC, in case you don't want to mess with VMMaker.

Native vs. Object Parts

This compilation generates a native executable file and many libraries (one for each external plugin). This files contain the native executable bits of the primitive parts of the OE. That includes:

  • Bytecode interpretation
  • Numbered and named primitives
  • Plugin primitives

As we said, methods are compiled into CompiledMethod instances in the image which have -among other things- a list of bytecodes. Thus, when a method is activated, its bytecodes have to be interpreted; that's the base of the execution model.

Inside the OE native executable code, this bytecode interpretation consists mainly of managing the OE state, like pushing and poping arguments to and from the stack, and sending messages. If you read carefully, you'll notice that there is an important group of things that is not implemented as bytecodes: primitives. There isn't any bytecode for things like adding two integers, but there is one to send a message, which determines if the method to be activated is a Smalltalk one or if it calls a numbered primitive, a named primitive or even a plugin primitive. Then you also have the actual executable code that got compiled with these groups of primitives. That would roughly make all the executable code you get when you compile.

But then, for all this native code you need to get support from the image. That is, you need methods that make the interpreter call the primitives and objects on which to perform them. For example, for SmallIntegers, you are going to have a lot of methods whicho don't have any Smalltalk code but a reference to the primitive that will do what the method's name says.

SmallInteger>>#+ aNumber 

<primitive: 1>
^ super + aNumber

Note: code after the <primitive: *> is a kind of fallback code that will be executed if and only if the primitive fails.


The idea is that for each group of primitives you'll have an object model onto which to apply them by sending messages that call primitive code. You may also notice that after compiling the OE you don't need Interpreter and Plugin classes anymore (unless you want to change the primitives and recompile), the only thing you need are the classes that make use of the primitives, and that's the reason why VMMaker isn't included with in the image by default.


Some time ago, Eliot Miranda started to work on a new VM for Squeak (and all its fork, like Pharo). This VM is called CogVM and it has been already released. In fact, is the default VM included in the PharoOneClick 1.1.1 and beyond. Cog VM includes a lot of new features, but in a glance:

  1. Real and optimized block closure implementation. This is why from the image side blocks are now instances of BlockClosure instead of BlockContext.
  2. Context-to-stack mapping.
  3. JIT (just in time compiler) that compiles methods to machine code.
  4. PIC (polymorphic inline caching).
  5. Multi-threading.

If you are a little arround Smalltalk you may have heard about Cog VM and Stack VM. What is the big difference? Stack VM implements 1) and 2). And Cog VM is on top of the Stack VM and adds 3) and 4). Finally, there is CogMT VM which is on top of Cog VM and adds multi-threading support for external calls (like FFI for example).

Cog VMs have increased performance arround 4x-10x. In addition, it has improved VMMaker quite a lot.

Useful links for CogVM:

User Contributed Notes

geert.wl.claes (19 June 2010, 11:05 am)

Shouldn't the references to Squeak say Pharo instead?

Add a Note

Licensed under Creative Commons BY-NC-SA | Published using Pier |