- Part 1 - Journey through the internals of .NET Sort
- Part 2 - Everything you wanted to know about Sorting in .NET - part I
- Part 3 - This Article
- Part 4 - Everything you wanted to know about Sorting in .NET part 2
- Part 5 - Everything you wanted to know about Sorting in .NET part 6
- Part 6 - Everything you wanted to know about Sorting in .NET part 5
- Part 7 - Everything you wanted to know about Sorting in .NET part 4
Last part ended with a
sneak peak of a
native code. This part is going to expand on it. We are entering
C++ world inside
CLR. First things first, why doing all this complicated stuff and not have
TrySZSort in managed code?
Why is part of the Sort in the native code?
IntroSort internaly. As we have discovered in the beginning of this serie when a custom
IComparer is provided a managed version of
IntroSort is used instead. Shall we check what would be the difference?
I am going to use
BenchmarkDotNet for that. A very good library that gives you a lot of data about your code, allocations, timing etc.
Looks like there is a 4-5x difference. In a nutshell like with every
complex system you take it as it is. When you use
managed code it gives you a lot of
security and gets rid of many
problems, but there is a price to pay. You have to give control and lose flexibility to
optimize, plus all the things that give you other nice things do cost time and cpu power. You cannot do certain optimization in managed code. If you ask a
native code always faster than
managed code? It is difficult as it is based on too many variables. Running native code faster also costs - time and require knowledge and expertise. Also managed
JIT compilers are getting more interesting optimizations that are making the code run faster.
In example of
TrySZSort in native code is clearly optimized and leverages native code, but it is a code that is harder to maintain and reason about. Ok then lets go down the rabbit hole and discuss how to establish connection beetwen
Calling unmanaged code
unmanaged code you need to use
extern keyword. It is used to tell the runtime that implementation of this function is in a different (external) place. There are
two ways to call external code.
P/Invoke (Platform Invocation)
Platform invocation is the mechanims provided by the
Common language runtimeto facilitate the calls from managed code to unmanaged code functions. Behind the sceneeess the runtime construcrs the so-called stub, or thunk, which allows the addressing of the unmanaged functon and conversion of managed argument types to the appropiate unmanaged types back. This conversion is known as parameter marshalling.[x]
To call unmanaged code using
P/Invoke, function has to be
extern and have
DLLImport attribute. This attribute takes
DLL file library name as a parameters.
DLL files are
Dynamically-linked-libraries containing compiled code. This files expose
Export Address table that hold
entry points to functions. Entry points used by runtime to call functions.
In this simple example, I am telling the compiler that
Function is in
nonexistinglib.dll. This generates
IL code with
pinvokeimpl tells the runtime that this is unmanaged method called using
P/Invoke. This method is available in library
nonexistinglib.dll and has the calling convention
winapi. More on calling conventions later in this post, but in a nutshell - calling convention describes how to call function, something like contract.
unmanaged dll expects certain
contract to be met in order to accept the call to function.
winapi convention is an alias of
What is happening here is CLR:
- JIT’s the code
- finds pinvokeimpl
- find the entrypoint to the unmanaged function
- prepares the
contractusing winap calling convention and
marshallsparameter int32 i
- calls the function
winapicalling convention to
It is actually a bit more complicated as there are things like
execution context or
sentinel item put on stack frame to mark the boundary beetwen managed and unmanaged code. but this blog post tries to draw a
A good real life example of
FileStream.Read When you call the
Read in the end you are actually calling
Win Api DLL's KERNEL32 and
WIN32.ReadFile. It uses this platform invocation to call
OS api and read a file.
InternallCall is a different way to call unmanaged function. It is a bit more efficent (due to it being in CLR close environment giving possibility to relax some time consuming operations around security, exception handling etc.) but you cannot create
asemblies. This call can be only used when calling functions implemented in
CLR is not only about runtime, it also contains optimized code used in many places like
StringBuilder.ToString(). This is main reason why using
StringBuilder is faster and a good prctice.
This is a good example of
StringBuilder [x] is called it is actually calling
FrameAllocatedString down stack. Code that is in
CLR and uses internallcall to do that.[[x]frame-allocated-string]
From the IL point of view code looks a bit different and uses
internalcall - tells the runtime to look for the implementation of the code inside
CLR maintains its own
Function entrypoint address table(there will be example with one function mention lated in the post) used to find the entrypoint. It also uses special calling convention
__fastcall that is not visible in
IL code as this is the only convention used. With
P/Invoke examplke, based on the
unmanaged code you are calling IL will have diffrent calling convention.
It is possible add new
InternalCalls but it requries changes in
CLR. You would have to execute your
CLS compliant language library using custom build
CLR. Instruction on how to do this are available on the github..
Calling CLR from managed code
We discussed two ways to call unmanaged code. I want to focus now on calling
CLR code as this is part of our
Sorting journey and how
TrySZSort is called. Discussed methods above
P/Invoke are translated to something sligthly different within the
CLR, this this is called
ECall is a
private native calling interface. This interface is inside
CLR. When the managed code wants to access internal code in
CLR it tells the runtime(
execution engine) to use
ECall to find the function and its entry point. Runtime then calls this function. As with previous examples there is also calling convention and marshalling done.
ECallis a set of tables to call functions within the EE (Execution Engine) from the classlibs. First we use the class name & namespace to find an array of function pointers for a class, then use the function name (& sometimes signature) to find the correct function pointer for your method.
There are two types of
FCall[[x]]fcall is more performant but also more
risky, much more difficcult to write correctly and uses InterrnalCall
is less performant but much more safe and uses P/Invoke
. It is mentioned as default and prefered option when calling code in CLR
. FCall` should only be used when there really a good reason for using it.
QCallsare the preferred mechanism going forward. You should only use
FCallswhen you are “forced” to. This happens when there is common “short path” through the code that is important to optimize. This short path should not be more than a few hundred instructions, cannot allocate GC memory, take locks or throw exceptions [[x]]]qcall-preffered
Microsoft is moving more code from FCall to managed code.
We have ported some parts of the CLR that were heavily reliant on FCalls to managed code in the past (such as Reflection and some Encoding & String operations), and we want to continue this momentum. We may port our number formatting & String comparison code to managed in the future. [source][fcall-deprecation]
- is more safe but less performant
- default and preferred choice
QCall declaration on managed side.
- should only be used when it is possible to make performance gains by using it
- are more performant when
Frames(HelperMethodFrame) are not used
- you need to create
Frameto handle Exceptions or
- susceptible to
GC holes[x] and
- more error prone due to manual control of
FCall declaration on managed side.
I was curious what needs to be done to generate new
FCALL. This commit from MS team is a great example.
- register function in a ECClass table - static table with entry points to
FCALLfunctions. This is used by jitter to find the entry points. Example: TrySZSort entrypoint
- add function to the
ECFuncarray for a class that has this function. Example: TrySZSort
- add extern static function with
InternalCalldecorator in managed code
- use FCIMPL macro to generate function - your code needs to be inside this macro
FCall and TrySZSort
TrySZSort is exposed to managed code using
InternalCall, it is using
CLR side it uses
FCIMPL4 macro to generate the function.[[x]][try-sz-sort-impl].
4 is a number of arguments,
FC_BOOL_RET tells the macro that this function returns
BOOL. Why FCIMPL macro is needed?
FCALLShave to conform to the
Execution Enginecalling conventions and not to C calling conventions,
FCALLS, need to be declared using special macros
(FCIMPL*)that implement the correct calling conventions.
It is all to do with
calling conventions. It was mentiioned before that
calling convention is like a contract but how does it work? As it is not a
small topic I created a separate blog post for it. This will be our next part.