Some sorting algorithms are better at solving a certain step of the process than others. For instance, insertion sort is good at detecting an already sorted list, and can do so in O(n)
time. However it has a worst case of O(n<sup>2</sup>) time, making it undesirable for larger tasks. On the other hand, Merge sort has a worst case time of O(n log n), but takes just as long to detect if a list is already sorted.
What if we could take the best parts of both these algorithms and make an algorithm with a best case of O(n) and a worst case of O(n log n)? As it happens, Tim Peters did just that in 2002 with the invention of Timsort!
Assumptions
- Knowledge of Big-O notation
- Familiarity with insertion sort and merge sort
What is Timsort?
Timsort is a stable, hybrid sorting algorithm that combines merge sort and insertion sort. The algorithm splits an array into "runs" of data, sorting them individually using insertion sort and pieces them back together using merge sort.
In application this works exceptionally well as most real world data is already nearly sorted. Exploiting this knowledge means that many of the "runs" can be completed in O(n) time.
Galloping
In Timsort, the standard merging algorithm alone does not satisfy to make it as fast as possible. When merging two runs, called A
and B
, sometimes one of the runs might contain many more smaller elements than the other run. When this is noticed, the algorithm jumps into what is known as galloping mode.
The algorithm keeps track of a variable called the minimum galloping threshold to make the most of the runs in the data provided. As the algorithm runs, this variable is changed based on its success with the data so far.
When galloping mode is enabled, the algorithm will attempt to find the first element of array A in array B or vice versa. What this means is, if the smallest element in B was larger than the largest element in A, Timsort will use Binary Search to find the correct place for the first element of B in A, and place a whole chunk of the array at a time instead of merging the elements one by one. This allows the merge to "gallop" over consecutive data.
> The code implementation of Timsort using galloping mode is a complex algorithm and varies widely in how it is constructed. A link to a Java implementation can be found <a href="https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/java/util/TimSort.java" style="color:inherit;" target="_blank">here</a>