Calling Conventions
- Part 1 - Journey through the .NET internals - Sorting
- Part 2 - List.Sort internals
- Part 3 - Array.Sort && TrySZSort
- Part 4 - Managed vs Unmanaged code and interop
- Part 5 - This Article
Contents
In this blog post we will answer the question What is a calling convention?
. A calling convention is like a contract that describes how the functions call each other, on the assembly
level using cpu
instructions`.
Calling Convention
It defines things like:
- the way arguments are passed to a function
- how values are returned
- how the function name is decorated
- who:
caller
orcalle
handlesstack
orregisters
clean up
It specifies how (at a low level) the compiler will pass input parameters to the function and retrieve its results once it’s been executed.1
CPU, Machine code and instruction sets
If we go down to the lowest levels of code
, there is a machine code
2.
This BTW is a Fibonacci number generation code in machine code
. I wouldn’t be able to write it that way, but what is important is that on this lowest level
it really doesn’t matter if this code comes from C++
, Java
, Python
or C#
. It would an impossible
task(almost) to write code that way. That is why we have a higher abstraction on top of machine code - assembly
language.
Example below is the same fibonacci number
generation code but in assembly
.
On this level which is still very low. We operate very close to the CPU
using - registers, stacks, and CPU instructions like mov
or jmp
. Every CPU supports different registers
and instructions
3.
First micro processor4 had 46
instructions5. These days you can check this list 6, there are hundreds of them. It all started with simple instructions, which were used to generate more complex operations. As these operations become very common, CPU designer added them as new instructions, often designing CPUs to make them more optimized.
Then there is also a difference between (RISC)ARM
and (CISC)x86
processors. The former have smaller number of instructions but require fewer transistors making them more power efficient7.
You can check the difference down below.
It is the same code but on different CPU families with different instruction sets. Due to this difference you need to compile the code for a specific machine. If you are familiar with Linux world, it is pretty standard procedure to download source code of some program and build it itself on your machine for your machine specific context. More popular distributions have packages with already pre-compiled binaries. Usually when you go to a release page of some software - example (ripgrep 8) you will see different binaries, for different operating systems, Linux, kernels or families of CPUs. (BTW ripgrep is an amazing replacement of grep).
This is partially why virtual machine
was created with platforms like JAVA
or .NET
. It helps with portability of software as instead of compiling your code to a specific instruction set. You compile it to intermediary language IL
or Java Bytecode
which is then compiled, usually lazily on the fly, by the Virtual machine to this machine specific context. It automates the whole process of building the code for your .
Functions in assembly
On this low level we operate with CPU instructions. The concept of function, argument, returning value from a function doesn’t exist. We can only use simple
primitives like accumulator, registers, stack, label and CPU instructions. These primitives can be used to create more complex code and something similar to functions.
This code is readable and it has concepts of types int
, function, arguments, +
operator, return and of course scope {}
.
When you compile this code to assembly. You get a different view with things like labels sum:
, CPU instructions mov, add, ret
, operation on stack [esp+4]
, stack pointer esp
and registers edx, eax
. It is a completely different world.
Looking at this code you might ask:
- Ok I see
ret
function which I assume is return, but how does it work? - Which value is returned?
- If I call it how will another function how to get the value?
And that is why we have calling conventions
to create a contract with information for functions on how to call each other.
Calling conventions can differ in many ways:
- where are the arguments stored - registers, stack
- where do you put the result of the function call (stack, register, memory)
- who is responsible for clean-up - caller or callee ( this makes a difference in assembly code size, if caller is cleaning up the stack - the compiler has to generate clean-up instructions next to the function call)
- who is responsible for
cleaning
upregisters
and bringing them back to previous state (before the function was called)
You can check the list of x86 calling conventions here 9. We will use cdecl
and fastcall
as an example.
CDECL and FASTCALL
If one of the functions expects call using cdecl
convention. It is expecting:
- arguments to be on the stack
- caller cleaning the stack
If we then call this function using fastcall
convention both requirements won’t be met:
- for fastcall first
three
(for Microsofttwo
) arguments are kept in the registers - stack won’t be cleaned up as fastcall assumes that
callee
is responsible for that.
Source code 10.
This simple function multiplies
numbers. We have function cdecl
which is marked with cdecl
attribute to force this calling convention (this is actually default and this attribute is not needed).
I am compiling this code with these flags:
-m32
- forces 32 bit executable - without this flag calling conventions are ignored (couldn’t find why)-O0
- I don’t want to optimize this code as with such a simple example-O1
in the caller puts a static value(2 * 3 = 6)
-fomit-frame-pointer
- one optimization that removesframe pointers
to make theasm
code a bit simpler. (At the end of this post there is a example without this optimization explained if you are curious what is the difference).
-fomit-frame-pointer
Don’t keep the frame pointer in a register for functions that don’t need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines. 11
It removes these instructions.
This simplifis the code to this form.
For comparison lets look at fastcall
.
Source code 12.
I added third
parameter to show that only first two
arguments are passed through the registers.
For simplicity we can simplify this code to this.
There is no need to reserve place on the stack
, move values from registers to the stack
and then get values from the stack
. Compiler potentially does it due to consistency
.
Arguments are first saved in stack then fetched from stack, rather than be used directly. This is because the compiler wants a consistent way to use all arguments via stack access, not only one compiler does like that. 13
In the end we will analyse this code.
So this is it. Examples of differences between fastcall
and cdecl
. What would happen then if we would mix
conventions. Example below shows what happens when a caller
and calle
are not abiding to the same convention.
fastcall
still thinks that arguments were passed through registers and obviously there will be some
data. It is not the data passed by the caller as he used cdecl
conventions and passed arguments through the stack. This would generate an unexpected and hard to debug behaviour. That is why calling conventions
are important. There is a long history behind them 14151617