In the last post we have discovered what are calling conventions and how does code operate on the machine code level. With this knowledge we can finally wrap up and finish discussing FCall.

Why FCall is special

The answer to that question can be found in CoreClr 1

An FCall target uses __fastcall or some other calling convention to match the IL calling convention exactly. Thus, a call to FCall is a direct call to the target without no intervening stub or frames.

We already discussed __fastcall which uses registers to pass arguments (first 2). It is a bit faster compared to other conventions like JIT calling convention. This one uses stack to pass arguments and return values. Stack operations use precious cpu cycles for setup and cleanup. It is a micro-optimisation and I don’t have any benchmarks but __fastcall should faster. 23

FCall is a direct call and no stubs or frames are interviening with it. When it comes to optimization and cpu cycles the less is more, but we need to discuss stubs and frames briefly.

What is a stub

The first thing that comes to my mind in relation to stub is Unit Test. Simple objects used to control the context of Unit under test. Stubs in the CLR are completetly different. In CLR world stub is a small helper code. There are many stubs used for different reasons.

For instance prestub is used in Just in time compilation process. When C# code is compiled, every method changes to IL and is kept in DLL or EXE file. You can’t run IL straight away, it is not machine code. It has to be compiled once again by the runtime. When you start your program it starts main Thread and loads up Execution Engine. The first function to run is your Main function. It is kept along with other functions in your EXE file in a big table containing functions names, functions descriptions, metadatada, IL code and prestub. Il code cannot be the entrypoint as it is not yet a machine code - but prestub is and it serves as a main entrypoint.

prestub contains machine code that calls runtime and orders it to compile it. Runtime takes the IL code compiles it and cleverly injects into the same memory address, replacing prestub with actual machine code. This process repeats itself and when Main function calls another function it again calls a prestub which tells the runtime to compile it. Proceess is repeated all the time and that is how Just In Time compilation is achieved. This is a very basic description of whole process as there is a JIT caching layer which helps to overcome some JIT overhead as it uses cpu cycles obviously. There are tools like NGEN that lets up compile aall the stubs and generate machine code for all the methods (use it only when you know what to do - JIT should be used in most of the scenarions) 4

There are more stubs in the CLR and this are used in different scenarions like:

  • implementation of generics and dynamic code
  • marshalling in P/Invoke
  • security checks
  • exception handling
  • support for multiple calling conventions
  • legacy code adapters

What is a frame

There are many usages of word frame the one I am mostly used to is OSI model frame. It is a data structure that holds sender/receiver information and the packet - payload.

Network frame image

When reading CLR code and documentations you can encounter multiple references to frame. It gets confusing as this word is used for many different things. There is a stack frame, a region on the stack that holds the context of method call: arguments, local variables, return values.

Stack frame

UThere is a exception handler frame used in exception handling and execution engine frame, this one is interesting as it is used as a structure to hold various metadata used by the runtime to generate exection context. There are many EE frames 5.

FCall is faster beacuse it limits number of frames and stubs it requires to operate thus saving cpu cycles:

  • there is no pre stub jitted code calls directly FCall entry point
  • as FCall is inside execution engine and matches IL calling convention not requiring marshalling stub to help with the communcation
  • number of frames used with FCall is smaller, you need to manually create frames in order to throw exception or call garbage collection 6 (but when you do this QCall is faster and preferred choice 7)