Array.Sort && TrySZSort
- Part 1 - Journey through the .NET internals - Sorting
- Part 2 - List.Sort internals
- Part 3 - This Article
- Part 4 - Managed vs Unmanaged code and interop
- Part 5 - Calling Conventions
In the previous part, we have explored the list internals, ending on
Array.Sort function. This one we will go down the rabbit hole and explore source code of
Array.Sort with a sneak peek of a mysterious
Location in the function call stack.
Array.Sort is called from within
List is a wrapper around the
array. Due to this, call to
Sort is redirected to
Array.Sort<T> which takes an array as an argument.
Source code. 1
_itemsfrom the list
0- beginning of the list
length-> count of the list as we want to sort all of it
There are two interesting attributes added to this function.
Code Access Security attribute. Marks function as being callable by transparent code. This is a special section of your code (or you can mark the whole assembly to be transparent) that creates a
sandboxed limited environment with rules like - cannot call native code, can call only other transparent code or
SecuritySafeCritical code, cannot elevate privileges or access protected resources. This feature is not really useful for web developers as we use HTTP and API layer to create our own
sandbox with defined can dos and cannot dos in the
contract. The situation is different in desktop and library development space.
SecuritySafeCritical attribute makes it possible for
sandboxed parts of the code to use
Verifiable code is code that gets compiled to IL and can be proven to not produce any IL that can execute unsafe code, bypass code access security checks or in any way corrupt the state of the CLR. 5
This attribute is part of a feature called
Constrained Execution Region. It was added to the framework for
CLR integration with
SQL Server 2005.
starting with the SQL Server™ 2005 release, SQL Server is able to host the common language runtime (CLR), allowing stored procedures, functions, and triggers to be written in managed code. Since access to these stored procedures must be fast, SQL Server hosts the CLR in-process. 6.
This feature is really useful when you write processes that have to be
reliable. And now you might say that hey everything should be reliable, the software I write is also reliable, we have unit tests, defensive coding, it works. Why would I even want to use
CER. Well, reliability is not a binary value. It is not like software is either reliable or not. It is like a grayscale. It is like with the cloud then you get
SLA7 which is not mentioning 100% availability but 99.999%. If you look at
AWS S3 even this service is not 100% durable and available. 8
Google in their great SRE talks series 9 defines more metrics like SLI, SLO and Error Budget. The world and software we write is more complicated than that. Writing fully reliable code is probably impossible and reaching each next SLA number costs more and more.
Writing reliable code in the face of everything that go wrong can be a daunting task. The good news is that unless you’re writing a framework or a library for use in CLR hosts that require prolonged periods of up time, you probably won’t need to think about this stuff too often.6
I come from a
simple web services world where my custom written code execution time is short lived. There is a request and response. Usually, it takes tens of milliseconds, maybe seconds - rarely minutes. Even if there is a failure most of the time it is possible to do a retry. But what if you would like to write a software that is not only reliable for a lifetime of a single request but for
months? This is where
CER can help, giving options to write code that is less susceptible to
failures. There is, of course, a cost associated with that - constrained options and things that you can do in the code.
Remember that in SQL Server, MTBF(Mean time beetwen failures)10 is measured in months not hours and the process restarting because an unhandled exception happened is completely unacceptable.11
In order to achieve more reliable code which was a requirement for a code running on
CER provides certain guarantess like:
- runtime delays throwing Thread.Abort exception waiting for
CERcode to execute.
- runtime prepares
CERcode as a priority to make sure there is a space on a stack and in memory. Decreasing the likelihood of throwing
out of band exceptions1213 like
OutOfMemoryexception. This is very useful in a scenario with
finallyblock not being able to execute because there is no memory left and
OutOfMemoryexception is thrown by the runtime. In this finally, block you might have an important set of functions that are needed to recover the process from crashing. This is a
unlikelyscenario, but if your service is running for
yearsthis can happen.
the runtime is constrained from throwing certain asynchronous exceptions that would prevent the region from executing in its entirety.6
This is not only about the runtime there are the limitations on the developer side of things like code cannot:
- acquire lock
- explicitly allocate
- call reflection
This is to prevent writing
riskier code in specifically designated code regions making them more reliable.
This creates a framework and an enforcement mechanism for authoring reliable managed code 6
CERs are a way to move any runtime-induced failure point from your code to a time either before the code runs (in the case of JIT compiling), or after the code completes (for thread aborts). 6
Ok, but what about
Sort method? First, this attribute is only useful in the
CER. You create
CER by using
PrepareConstrainedRegions() function. It is an explicit decision.
In order to avoid potential
try block if you have used
PrepareConstrainedRegion and your method has attribute
ReliabilityContract. For this method:
- all the assemblies are loaded
- code is compiled
- there is a check if the stack has (48KB) of space available (apparently 48K is an average method size)
… when used from within a constrained execution region a method marked with a ReliabilityContract will be prepared before execution by the JIT to be pre-compiled and the memory for the method will be pre-allocated when entering the PrepareConstrainedRegions. 11
CER got you interested then try this presentation 14 big thanks to Grzegorz Kotfis 15 for the link.
What is TrySZSort?
We have discussed attributes, time to talk about the code. This is the source of
Array.Sort. Exception throwing logic was replaced with comments to make it more readable. The core of sorting in .NET is an
external native function called
TrySZSort. Under the hood
C++ code that is part of the
CLR itself. This code is heavily optimized.
I want to focus on this code. This is were
managed code mixes with
TrySZSort - this is the real underlying function that is not part of
length > 1 is used to check if array requires sorting. If there is only one element there is no need to sort at all. It is also worth noting that
TrySZSort is called only for default comparer. If you provide custom
Comparer it won’t be used. For custom
Comparers, similar sorting algorithm as the one in
TrySZSort is executed inside
managed code. This, of course, lacks all the benefits of
unmanaged code and misses most of the native optimizations.
TrySZSort what does it mean? - There are two parts
Sort is obvious and
SZ comes from
Single-dimensioned zero based arrays
Single-dimensioned - It is a typical array with one dimension.
TrySZSort doesn’t support
TrySZSort supports only arrays starting with index
It got me thinking. If
zero based is explicit, does it mean that arrays could be
one based? Was there no agreement on that in the past? For me,
zero was always the first index. I just assumed that this is the standard. It kind of is now … but it wasn’t in the past. Hmm … can you create a non
zero-based array in
It is against the
CLS (Common Language Specification) - which is a set of rules and limitations that you need to follow to be
Compliant. Which means that your
code is usable by any
CLR language - (This is important when building frameworks or librarires).
All dimensions of an array must have a lower bound of zero. 16
CLR supports it and in
C# you can make
one, two, x index based arrays. By default, indexing is
zero based but you can force
C# to show the
hidden parts. Small warning. Don’t use non zero based arrays it in your code as it won’t be CLS compliant, it is also a bad and non intuitive practice this days. IL, JIT and CLR is also optimized to operate on
since zero-based arrays are by far the most common, Microsoft has spent a lot of time optimizing their performance. However, the CLR does support non-zero-based arrays but they are discouraged.17
I know what you are thinking right now
Show me how to make one based array. To do it you need to use
CLR keeps this as a backward compatibility for
Visual Basic code.
TrySZSort contains a glimpse of a long and complicated programming history. in
VB you were able to create arrays starting from index
VB .NET 2002 - you were able to create
x based arrays. VB evolved from different language history than
C# that took a lot from
The concept of arrays starting from
0 has also an interesting history. It wasn’t that obvious which approach was better. Digging around this topic yields interesting discussions about easier
pointer arithmetic if arrays are
0 representing a start of the memory address is also more
1. Some people point to the
Math and certain operations that are easier if we assume start of the array as
Array.CreateInstance vs new
When I used
Array.CreateInstance for the first time, I wondered how different it is from using
new. Jon Skeet points to some directions in his SO answer20
newarr is built into the language while
CreateInstance is a method call that actually leads to the unmanaged code inside
newarr is a IL instruction responsible for the creation of a
vector which is a different name for zero-based single dimensional array. Size of array
99 is pushed to the stack using
newarr uses this value from the stack to create the vector.
CreateInstance is in unmanaged code just like
TrySZSort, it is time to peek in there.
First peek at the unmanaged world
As I mentioned earlier
TrySZSort is a extern function.
extern is a special keyword used to indicate `external’ resources.
When a method declaration includes an extern modifier, that method is said to be an external method. External methods are implemented externally, typically using a language other than C#. 22
In this case,
TrySZSort is a method with implementation in
CLR - extern is used to make that connection between managed code and the the
CLR. We are entering
C++ world now. Source code for
TrySZSort can be found here 23. It is great that Microsoft open sourced it, as we can check it.
In next part, we will look into how it is possible to call
These are exceptions that are thrown by the runtime, not the code - StackOverflowException, OutOfMemoryException and ThreadAbortException. ↩