Sort algorithms

The common presentation of in place sort states that there are 3 'naive' approches to sorting: Bubble sort (also called exchange sort), Selection and Insertion sort, and three more elaborated ways to sort: Heapsort, shell sort and quicksort. There are in fact many more ways to sort (in place or not in place, based on comparisons or based on some other property of the key...). There are also many local improvements of the base algorithms that can useful to know about. These pages show you all the base approaches, as well as a few of the local improvements that can be performed.

The following algorithms are written in Java, as they are used in the visualizer. They rely on the following class:

abstract class SortAlgorithm {
// the important members
int size; int array[];
// members to be used exclusively in subclasses to handle the sort
public void exchange (int i, int j) {}
public int compare (int i, int j) {}
public int compare_val (int val, int i) {}
public void assign (int ndx, int val) {}
public int item (int i) {}
public void delay(float f) {}
abstract void sort ();
// the function that implements the actual sort, using the above functions.
};

To study those algorithms, you may want to open 2 windows in your web browser, one that describes the algorithm, and one on the visualizer, which lets you see how this algorithm performs.

Educational approaches

Selection Sort

Bubble Sort & Shaker Sort

Insertion Sort, Dichotomic insertion (1 & 2)

Heap Sort

Professional approaches

QuickSort & Quicker Sort

In place stable sort

The two following approaches are interesting to study and have in some cases constitute an interesting choice, even though they are hardly used in practice:

ShellSort

Dobosiewicz Sort also called CombSort

On Studying the complexity of sorting algorithms

When using the vizualiser to compare algorithms, never forget that the vizualiser sorts only very small arrays. The effect of quadratic complexity (either a square number of moves or a square number of exchanges) is dramatic as the size of the array grows. For instance, dichotomic insertion, which is only marginally slower than quicksort on 100 items, becomes 10 times slower on 10000 items!

Number of comparisons

The number of comparisons is often the driving factor in the 'slowness' of an algorithm. Indeed, a comparison can involve much more computation than the base loop of the sort algorithm: for instance, when sorting almost identical strings, one can have to compare character for character on large lengths before decising which string comes first. Sometimes, while pointers to the objects are stored in memory and thus cheap to move, the actual data must be kept on remote storage, and loaded for comparisons purposes. In this later situation, studying the best strategy becomes even more complex, as one can consider caching a few sort keys in memory: this means that a good algorithm may not try to reduce the number of comparisons, but rather increase the locality of references, reusing cached data as much as it can before advancing further.

Still, one often considers the number of comparisons as the best mean to compare sort algorithms. But it is not sufficient: By this rating, dichotomic insertion would show the best algorithm, and it is almost never the case.

It is easy to see that a sort algorithm based on comparison has to do at least log2 n! comparisons in the worst case to find the proper location of each item (exercise: check the case where n=3, and find through a recursive reasoning how many additional comparisons you need to perform to find the order of n+1 items using only individual comparisons). One often translates this constraint in approximating the number of comparisons by n*log(n).

Number of exchanges

Any algorithm has to do at least n moves in the worst case. Yet, naive approaches are quadratic in number of exchanges. The best algorithm for number of exchanges is selection sort, quicksort coming fairly close.

Note that if you ignore the number of exchanges in your analysis, you do overlook an essential part of the problem. For instance, the dichotomic insertion sort is the most efficient algorithm one can devise in numbers of comparisons, yet, it still has a quadratic complexity.

Stack and additional costs

Modern algorithms like quicksort and in place stable sort hide some memory costs to their apparent efficiency: Both require a stack to remember the sub arrays that need to be sorted or merged. In the case of quicksort, this stack can grow in the worst case to be proportional to log(size of the array), which makes it unfair to call quicksort a true 'in place' sort.

Conclusion

A good sorting algorithm must balance complexity in number of comparisons, number of exchanges, being at most comparable to n log n in number of comparisons and linear in number of exchanges. Stack costs should be considered too.

While quicksort is the most commonly used sort, one can consider simpler and more efficient algorithms, if one has a good hint of the distribution of the data to sort.

In place sort algorithms