BEAM DefinitionsΒΆ

For specific constant values, bit positions and bit sizes please always refer to original beam sources, file: erl_term.h

Boxed Value

A term value, whose primary tag is (TAG_PRIMARY_BOXED=2), contains a pointer to data on heap.

The first word of the box always has header tag (TAG_PRIMARY_HEADER=0) in its 2 least-significant bits. Then goes the subtag (following 4 bits) which determines type of the value inside the box. Knowing this allows scanning heap and interpreting found data.

VM uses boxes to store larger values, which do not fit into Word size minus 4 bits (immediate-1) or Word size minus 6 bits (for immediate-2).

Examples of box: bigint, float, fun, export, heap and refcounted binary, external ports and pids (with host name in them).

Cache Locality

A mythical creature used to scare naughty developers. It may have either huge impact on your code performance or very little. Always run performance tests before you start optimizing for this.

Context

Virtual machine context is a small structure which contains a copy of active registers, instruction pointer, few other pointers from the currently running process. It is stored in CPU registers or on stack (based on C compiler judgement when building the emulator).

When a process is ready to run, its fields are copied into the context, i.e. swapped in. When a process is suspended or scheduled out, the context is stored back into the process. This operation is called swapping out. This is not a cheap operation, so VM tries to minimize context switches.

A context switch is also performed when changing from BEAM code to native compiled code by HiPE or JIT. This is why mixing HiPE and normal Erlang code in your program, which calls each other, is an expensive luxury.

CP, Continuation pointer

A raw pointer to a location in prepared BEAM code. It is used as a return value or as a label offset. CP and can only be found on the stack, never in registers or on heap.

Header tag

When a box is referenced or when a block of memory is stored in heap, its first Word usually has few least-significant bits reserved (currently 6). First goes the primary tag (2 bits of TAG_PRIMARY_HEADER=0). Then follows the header tag (4 bits) which defines the type of contents. Remaining bits may hold other information, often these bits store the arity (contents size).

Immediate

A term value, whose primary tag is TAG_PRIMARY_IMMED1=3 contains an immediate value. Two bits follow the primary tag and determine the value type (TAG_IMMED1_* macros). If the immediate-1 tag equals TAG_IMMED1_IMMED2=2 then two more bits are used to interpret the value type (TAG_IMMED2_* macros).

Examples of immediate: small integers, local pids and ports, atoms, empty list NIL. An immediate value fits into one Word and does not reference any memory.

Heap

Dynamically allocated block of Words used by a process. Heap can be resized with help of either ERTS_REALLOC* macro or by allocating a new fragment and moving data there using garbage collector. Data is placed onto heap sequentially from its start.

Port

A special value which you receive when you call erlang:open_port. It is hooked to a port driver (built-in or custom). You can send it commands and receive messages from it, you can close it, link to it and monitor it (monitoring added in v.19).

A port driver manages some resource, such as a file, a socket, a ZIP archive etc. Ports provide your process with a stream of events from some resource, and you can write commands and data to them as well.

Primary Tag

When a term value is encoded, several least-significant bits (currently 2 bits) are reserved to represent type of contained term.

Term tag can be: a box (TAG_PRIMARY_BOXED=2), a list (TAG_PRIMARY_LIST=1), a header (TAG_PRIMARY_HEADER=0) or an immediate (TAG_PRIMARY_IMMED1=3).

Reduction

Each instruction or a call has a cost, it uses imaginary units called reductions, where 1 reduction is approximately one function call. Cost of other calls and operations is derived from this approximately.

Registers

An array of Words used to pass arguments in a function call. When a recursive call is made, affected registers are also saved onto the stack.

Roots

During garbage collection, the roots are all known to be live values, they are collected from:

  • the stack

  • the live registers

  • the process dictionary

  • the message queue

  • the group leader and the exception reason fields.

Anything that can be traced by following references from the roots is considered to be reachable data. This data is moved to the new heap. Previous heap is discarded, because no data can be reached on it anymore.

Scheduler

Scheduler is a loop which runs on a fixed CPU core and it either fetches and executes next instruction based on instruction pointer in current process, or takes next process in the queue. As soon as a process has been running for certain number of reductions (say 2000 but number may change), it is scheduled out and put to sleep, and next process takes its place and continues running where it left off. This allows some sort of fair scheduling where everyone is guaranteed a slice of time, no matter how busy some processes are.

Slot

A special tagged value which points at a register, float register, or a stack slot. It is used internally by instructions and never appears in Erlang programs as data.

Stack

A section of young heap of a process, which is used as temporary storage and return stack by a process. A new process creates a stack which has zero size and begins at the heap_end. Stack grows back (decrementing memory address) until it meets heap write position (heap_top). Then heap is considered full and garbage collection will trigger.

Data on stack is grouped into Stack Frames.

Stack Frame

Functions can create a stack frame by pushing a CP value and reserving several extra words on stack. Sometimes, when code throws an exception, VM scans the stack to build a stacktrace and uses these CP values as markers.

Each frame corresponds to a function call. A frame always begins with a CP value which marks a return address can be used to find a frame boundary. Rest of the frame is used to store any temporary variables and register values between the calls.

Term

A term is any value in Erlang. Internally a term is a Word with few least-significant bits reserved (2 to 6 bits depending on the value) which define its type. Remaining bits either contain the value itself (for immediate values) or a pointer to data on heap (box values).

Terminating a Process

An exit or kill signal is sent to a process which works similar to an exception. If process was able to catch an exit signal (trap_exit), then nothing else happens.

Process that is going to die will free its memory, trigger all monitors and links, leave the process queue and get unregistered from the process registry.

THE_NON_VALUE

Internal value used by emulator, you will never be able to see it from Erlang. It marks exception or special type of return value from BIF functions, also it used to mark memory during garbage collection.

Depending on whether DEBUG macro is set and HiPE is enabled, THE_NON_VALUE takes value of primary float header (6 least-significant bits are 0110-00) with remaining bits set to either all 0 or all 1. Or it is all zero-bits Word (marking a zero arity tuple on Heap), which never can appear in a register, thus marking it useful to be the special return value.

Word

Machine-dependent register-sized unsigned integer. This will have width of 32 bits on 32-bit architecture, and 64 on a 64-bit architecture. In BEAM source code Word can be unsigned (UWord) or signed (SWord).