Last part ended with a
sneak peek of a
native code. This part is going to expand on it. We are entering
CLR world. But, first things first. Why do all this complicated stuff and not have
TrySZSort in managed code? The answer is of course speed, let’s look into that.
Why part of the Sort is in the native code?
We can easily check how this code works in managed code. It was mentioned before but if you use custom
IComparer a managed version of the code is used. I am going to use
BenchmarkDotNet to check the difference. This library is excellent and beats
Stopwatch based method.
Code below defines custom comparer as we need it to force managed version of the sort. Then
BenchmarkDotNet is used to generate a benchmark. There are four different runs: unmanaged and managed with the size of the list from ten to ten thousand. Each list contains random integers (0-100 range).
This test is definitely not perfect but it still shows a 4-5x difference in favour of a native unmanaged code. Perfectly written and manually tuned native code will be faster, that is the case with
TrySZSort. Managed code hides manual memory handling, providing simplicity and security. It helps commoditize work as it is cheaper to write software. But like every abstraction, there is a price to pay. We give up some control and flexibility to
optimize the code by ourselves. The impact of that choice depends on the workloads, as in a typical line of a business app you might not have to worry about performance price. Even if you have to, you optimize the bottlenecks, not the whole system.
Sorting is a specialized problem to solve and gains a lot from native code optimizations.
Marc Gravell pointed out that current Compare method is not optimal.
very interesting, but also looks like an artificially slow managed compare; why not just => x - y ?
— Marc Gravell (@marcgravell) August 1, 2018
After implementation of
x-y logic the results looks slightly better for ManagedSort.
10000 items the mean time for managed sort has dropped by around
Calling unmanaged code
There is a special keyword
extern that is used to create connection beetwen
native code. It is used to tell the runtime that implementation of a function is in a different (external) place. There are
two ways to call external code.
P/Invoke (Platform Invocation)
Platform invocation is the mechanism provided by the
Common language runtimeto facilitate the calls from managed code to unmanaged code functions. Behind the scenes the runtime constrcuts the so-called stub, or thunk, which allows the addressing of the unmanaged function and conversion of managed argument types to the appropriate unmanaged types back. This conversion is known as parameter marshalling.1
To call unmanaged code using
P/Invoke, a function has to be
extern and have
DLLImport attribute. This attribute takes DLL file library name as a parameter. DLL files are
Dynamically-linked-libraries containing compiled code and metadata about it. One of the metadata is an address table holding memory addresses to all exported functions - entry points.
In this example, Function is in an external DLL, it is generated to an IL code with
Pinvokeimpl tells the runtime that this is an unmanaged method called using
P/Invoke. It is available in library
nonexistinglib.dll and has the calling convention
winapi. More on calling conventions later in this post, but in a nutshell - calling convention describes how to call this particular function, something like contract between HTTP services.
unmanaged dll expects certain
contract to be met in order to accept the call to a function.
winapi convention is an alias of
What is happening in this example with boolean function, is CLR:
- JIT’s the code (Just in time compilation)
- finds pinvokeimpl
- finds the entry point to the unmanaged function
- prepares the
contractusing winapi calling convention and
marshallsparameter int32 i (marshalling -> serialization)
- calls the function
winapicalling convention to
unmarshallthe bool value
It is actually a bit more complicated as there are things like
execution context or
sentinel item put on the stack frame to mark the boundary between managed and unmanaged code, but we are not going to get into these details.
A good real-life example of
Read is called in the end
Win Api DLL's KERNEL32 and
WIN32.ReadFile is used. This dll is part of Windows kernel.
InternallCall is a different way to call an unmanaged function. It is more efficient due to InternallCalls being in the CLR context, giving a possibility to relax some time-consuming rules and save on operations around security, exception handling etc. Unfortunately it is not possible to create
assemblies. This call can be only be used when calling functions implemented in
CLR is not only about runtime, it contains optimized code used in many places like
StringBuilder.ToString(). This is one of the reasons why using
StringBuilder is a good practice.
From the IL point of view, the code looks a bit different than in
P\Invoke and uses
internalcall - tells the runtime to look for the implementation of the code inside
CLR maintains its own function entry point
address table(example late in the post) used to find the address with the entry point. It also uses special calling convention
__fastcall (more on this in next post).
It is possible to add new
InternalCalls but it requires changes in
CLR. You would then have to compile your customized CLR and run your code using this new runtime. Instruction on how to do this are available on the github - building5 - running6.
Calling CLR from managed code
We have discussed two ways of calling unmanaged code. I want to focus now on how actually
CLR called. Both
InternalCall are translated to something slightly different within the
CLR context -
ECall is a
private native calling interface. This interface is inside
CLR. When the managed code wants to access internal code in
CLR it tells the runtime (
execution engine) to use
ECall to find the function and its entry point. As with previous examples, there is also calling convention and marshalling done along the way.
ECallis a set of tables to call functions within the EE (Execution Engine) from the classlibs. First we use the class name & namespace to find an array of function pointers for a class, then use the function name (& sometimes signature) to find the correct function pointer for your method. 7
There are two types of
FCall8 is more performant but more
risky, much more difficult to write correctly and uses InterrnalCall
is less performant but much safer and uses P/Invoke
t is mentioned as a prefered option when calling code in CLR
. FCall` should only be used when there really a good reason for it.
QCallsare the preferred mechanism going forward. You should only use
FCallswhen you are “forced” to. This happens when there is a common “short path” through the code that is important to optimize. This short path should not be more than a few hundred instructions, cannot allocate GC memory, take locks or throw exceptions 9
Microsoft is moving more code from FCall to managed code.
We have ported some parts of the CLR that were heavily reliant on FCalls to managed code in the past (such as Reflection and some Encoding & String operations), and we want to continue this momentum. We may port our number formatting & String comparison code to managed in the future. [^fcall-deprecation]
- is safer but less performant
- default and preferred choice
QCall declaration on the managed side.
- should only be used when it is possible to make performance gains by using it
- are more performant when
Frames(HelperMethodFrame) are not used
- you need to create
Frameto handle Exceptions or
- susceptible to
GC holes12 and
- more error-prone due to manual control of
FCall declaration on the managed side.
I was curious what needs to be done to generate new
FCALL. This commit from MS team is a great example.
- register function in a ECClass table - static table with entry points to
FCALLfunctions. This is used by jitter to find the entry points. Example: TrySZSort entrypoint
- add function to the
ECFuncarray for a class that has this function. Example: TrySZSort
- add extern static function with
InternalCalldecorator in managed code
- use FCIMPL macro to generate function - your code needs to be inside this macro
.NET Sort -> TrySZSort uses FCall
TrySZSort is exposed to managed code using
InternalCall, it is using
FCall. On the
CLR side it uses
FCIMPL4 macro for function generation.
4 is a number of arguments,
FC_BOOL_RET tells the macro that this function returns
FCIMPL macro is needed?
FCALLShave to conform to the
Execution Enginecalling conventions and not to C calling conventions,
FCALLS, need to be declared using special macros
(FCIMPL*)that implement the correct calling conventions.
It is all to do with
calling conventions. It was mentioned before that
calling convention is like a contract describing the way functions communicate using stack and registers but how does it work? This will be covered in the next post with a journey to the assembly code.