This article is Part 0 in a 4-Part Series.

In this blog post we will answer the question What is a calling convention?

TODO:

  • description how machine code looks like and how function work

Calling convention

A calling convention is like a contract that describes how the functions, on the assembly code level, communicate with each other using cpu instructions`.

Contract defines things like:

  • the way arguments are passed to a function
  • how values are returned
  • how the function name is decorated
  • who: caller or calle handles stack or registers cleanup

It specifies how (at a low level) the compiler will pass input parameters to the function and retrieve its results once it’s been executed.1

If we go down to the lowest levels of code, there is a machine code2.

8B542408 83FA0077 06B80000 0000C383
FA027706 B8010000 00C353BB 01000000
B9010000 008D0419 83FA0376 078BD989
C14AEBF1 5BC3

This is btw a Fibonnaacci number generation code in machine code. I wouldn’t be able to write it that way, but what is important is that on this lowest level it really doesn’t matter if this code was created from C++, Java, Python or C#.

It would an impossible task(almost) to write code that way and that is why new abstractions were invented like assembly language. Example below is the same fibonacci number generation code but in assembly.

fib:
    mov edx, [esp+8]
    cmp edx, 0
    ja @f
    mov eax, 0
    ret
    
    @@:
    cmp edx, 2
    ja @f
    mov eax, 1
    ret
    
    @@:
    push ebx
    mov ebx, 1
    mov ecx, 1
    
    @@:
        lea eax, [ebx+ecx]
        cmp edx, 3
        jbe @f
        mov ebx, ecx
        mov ecx, eax
        dec edx
    jmp @b
    
    @@:
    pop ebx
    ret

On this level which is still very low. We operate very close to CPU using - registers, stacks, various operations like mov, jmp. Every machine supports different registers and instructions[x]. Some operate on very primitive instructions and some have more complicated ones. Example [x].

x86
-----

repe cmpsb         /* repeat while equal compare string bytewise */

ARM
-----

top:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2    /* subtract r2 from r3 and put result into r2      */
beq  top           /* branch(/jump) if result is zero                 */

It is the same code but on different architecture with different instruction sets. It is just like diffeences on the high level beetwen languages like C# and Java. Due to this differences you need to compile the code for specific machine. If you are int linux world, it is pretty standard procedure to download source code of some program, tool and build by your own on your machine for your machine specific architecture and instructions sets. More popular distributions have packages with already precompiled binaries. That is why things like java virtual machine and clr were created. They server as a intermidiary beetwen code and machine using byte code and JIT compilation, compiling code on the fly to match the machine.

If in the the end all there is ‘assembly’ then it should be easy to communicate beetwen different languages. Devil is in the details. The machine doesn’t really know which language ultimately generated instructions set that is execcuted by it. But the more we go one level up the more differences are there. There are different compilers, different flags, different languages, different architecture, different contexts and different conventions.

One of the important difference, I want to talk aobut is calling convention. As mentioned earlier it is like a contract exposed by one function to other to tell it how to communicate with it. If function A uses different convention than function B they won’t be able to call each other.

How does machine code works? how does code works? Turning machine?

Calling conventions can differ in many ways:

  • storage of arguments - registers, stack
  • how are the arguments added to the stack - left to right or right to left
  • where do you put result of the function call (stack, register, memory)
  • who is responsible for stack cleanup - caller or calle ( this makes a difference in assembly code size, if caller is cleaning up the stack - the compiler has to generate cleaning logic every time a function is called)
  • who is responsible for cleaning up registers and bringing them back to previous state (before the function was called)

Analysing few examples, based on x86 calling conventions[x].

If one of the functions expects call using cdecl convention. It is expecting:

  • arguments to be on the stack
  • caller cleaning up the stack

If we then call this function using fastcall convention both requirements won’t be met:

  • for fastcall first three (for Microsoft two) arguments are kept in the reggisters, and calle expectes this values on the stack when it is empty
  • stack won’t be cleaned up as fastcall assumes that callee is responsible for that.

cdecl example [x]

__attribute__((cdecl)) int cdecl(int a, int b) {
      return a * b;
}

int caller() {
    return cdecl(2, 3);
}

This is a simple function that all it does is multiply numbers. We have a cdecl which is marked with cdecl attribute to force this calling convention (this is actually default and this attribute is not needed).

I am compiling this code with two important flags:

  • -m32 - forces 32 bit executable - without this flag calling coventions are ignored (couldn’t find why)
  • -O0 - I don’t want to optimize this code as with such a simple example -O1 in the caller puts a static value (2 * 3 = 6)
  • -fomit-frame-pointer - one optimization that removes frame pointers to make the asm code a bit simpler. (At the end of this post there is a example without this optimization explained if you are curious what is the diffference).

-fomit-frame-pointer
Don’t keep the frame pointer in a register for functions that don’t need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions. It also makes debugging impossible on some machines. [x]

It removes instructions

  - push ebp      <- preserve the caller function entry point on the stack
  - mov ebp, esp  <- point ebp to this function stack frame pointer (create stack frame)
...
  - pop ebp       <- restore entry point from the stack of the calling function to be able to go back

So how does the asm code looks like with all this flags?

cdecl:
  mov eax, DWORD PTR [ebp+4]  move value 'a' from the stack to the accumulator eax
  imul eax, DWORD PTR [ebp+8] multiply eax by 'b' from the stack
                              -> in cdecl called function expects arguments on the stack
  ret
caller:
  push 3                      push `a` to the stack
  push 2                      push `b` to the stack 
                              -> in cdecl arguments are pushed to the stack
  call cdecl
  add esp, 8                  'clean up' the stack by moving the pointer
                              -> in cdecl caller cleans up the stack
  ret

We can see all the characteristics of cdecl call in this code. I am going to compile now fastcall function using similar flags.

Example of fastcall [x]

__attribute__((fastcall)) int fastcall(int a, int b, int c) {
      return a * b * c;
}

int caller() {
    return fastcall(2, 3, 4);
}

I added third parameter to show that only first two arguments are passed through the registers. There is one thing, I am going to change in the code to make it more readable - fastcall looks like this.

fastcall:
  sub esp, 8                   -> reserve place on the stack
  mov DWORD PTR [esp+4], ecx   -> move `a` to the stack
  mov DWORD PTR [esp], edx     -> move `b` to the stack
  mov eax, DWORD PTR [esp+4]   -> move `a` from stack to eax
  imul eax, DWORD PTR [esp]    -> multiply `a` on eax by `b`
  imul eax, DWORD PTR [esp+12] -> multiply by `c` on the stack
  add esp, 8                   -> clean up the stack
  ret 4                        -> return to the caller and move stack pointer cleaning up `c`

But it can be simplified to this.

fastcall:
  mov eax, ecx
  imul eax, edx
  imul eax, DWORD PTR [esp]
  ret 4

There is no need to reserver place on stack, move values from registers to the stack and then get values from the stack. Compiler potentialy does it due to consistency.

Arguments are first saved in stack then fetched from stack, rather than be used directly. This is because the compiler wants a consistent way to use all arguments via stack access, not only one compiler does like that.[x]

In the end we will analyse this code.

fastcall:
  mov eax, ecx              move `a` to eax 
  imul eax, edx             multiply `a` in the eax by `b` in edx
                            -> in fastcall called function expects arguments in the registers
  imul eax, DWORD PTR [esp] multiply by `a*b` by `c` on the stack, esp is pointing at the top of stack
                            -> in fastcall third parameters is on the stack
  ret 4                     return to the caller and cleanup stack from `c`
                            -> in fastcall called function is cleaning up the stack
fastcall:
  ret
caller:
  push 4          move `c` to the stack
                  -> in fastcall third argument is passed using stack
  mov edx, 3      move `a` to edx
  mov ecx, 2      move `b` to ecx
                  -> in fastcall first 2 arguments are passed by registers
  call fastcall   
  ret

We just discussed differences beetwen fastcall and cdecl but what if caller function used different convention than calle funciton ? Assuming for a while this scenario - the code will look like this.

fastcall:
  mov eax, ecx    -> `fastcall` expects arguments on the registers
  imul eax, edx   
  ret
caller:
  push 3          -> but caller pushed arguments on the stack
  push 2         
  call fastcall
  add esp, 8 
  ret

In this example fastcall function will use somekind of data on the registers. It wouldn’t get the data it was expecting 3 and 2 but something. This would generate an unexpected and hard to debug behaviour. That is why calling conventions are important. There is a long history behind them 3456

Next post in the series will expand on it and describe the link beetwen JIT, CLR, calling conventions and FCall.

https://msdn.microsoft.com/en-us/library/6xa169sk.aspx https://msdn.microsoft.com/en-us/library/984x0h58.aspx

https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html

This code is using no opitmizations -O0 
Other flags used:
-m32 - to force 32 bits as on 64 calling conventions are ignored

cdecl:
  push ebp                    <- preserve the caller function entry point on the stack
  mov ebp, esp                <- point ebp to this function stack frame pointer (create stack frame)
  mov eax, DWORD PTR [ebp+8]  <- move value 'a' from the stack frame to the accumulator eax
  imul eax, DWORD PTR [ebp+12]<- multiply eax by 'b' from the stack frame
  pop ebp                     <- restore entry point from the stack of the calling function to be able to go back
  ret                         <- return to the entry point of the calling function
fastcall:                     
  push ebp                    <- preserve caller function entry point on the stack
  mov ebp, esp                <- point ebp to this function stack pointer (create stack frame)
  sub esp, 8                  <- allocate space for 2 items on the stack frame (using substraction) 
  mov DWORD PTR [ebp-4], ecx  <- put the `a` from ecx register on the stack
  mov DWORD PTR [ebp-8], edx  <- put the `b` from edx register on the stack
  mov eax, DWORD PTR [ebp-4]  <- move `a` to accumulator eax
  imul eax, DWORD PTR [ebp-8] <- multiply by `b`
  mov esp, ebp                <- `destroy` the stack
  pop ebp                     <- restore entry point of the caller
  ret                         <- return to the entry point of the calling function
caller:
  push ebp                    
  mov ebp, esp
  sub esp, 24                 <- make space on the stack
  sub esp, 8
  push 3                      <- push `a` to the stack
  push 2                      <- push `b` to the stack 
  call cdecl
  add esp, 16                 <- 'clean up' the stack
  mov DWORD PTR [ebp-12], eax <- after cdecl call eax contains a*b result of cdecl function, put it on the stack
  mov edx, 3                  <- move `a` to edx register
  mov ecx, 2                  <- move `b` to ecx register
  call fastcall
  mov DWORD PTR [ebp-16], eax <- eax contains result of a*b from fastcall function put it on the stack
  mov edx, DWORD PTR [ebp-12] <- put result of cdecl on the register
  mov eax, DWORD PTR [ebp-16] <- put result of fastcall on the register
  add eax, edx                <- add and leave the value on eax for the caller to use
  mov esp, ebp                <- `destroy` stack
  pop ebp
  ret

With level 1 optimziation -01 this looks even more clear

Diffs: (in the simplified code example)

  • fastcall uses edx and ecx registers
  • cdecl uses stack
  • fastcall using mov esp, ebp cleans up the stack
  • cdeccl doesnt do this the caller is ussing add esp, 16 to clea up the stack

I replaced leave instructions with mov esp, ebp pop ebp

sub esp add esp instructions are nicely explained here stack operations[x]