sort
method derivation from method headers
I present merge sort to students much differently than I present selection sort and insertion sort. Instead of tracing the algorithm first, I ask students to derive the merge sort algorithm from the method headers.
Step 1 method headers
The method headers for a recursive implementation of merge sort are below. Preconditions and postconditions are included.
// postcondition: arr[0] ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
}
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
// This method is recursive.
}
// precondition: arr[low] ... arr[mid] is sorted &&
// arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
// Assume, for now, that this method already works.
}
Step 2 recursive sort
method base case
sort(int[], int, int)
is recursive. When writing recursive methods, we start with a base case (or possibly base cases). The method is intended to sort a range of values in arr
. If there is only 1 value (low == high
), we return (stop) since a single value is already sorted. If there are 0 values (low > high
), we return for the same reason.
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
// base case
if(low >= high)
return;
}
A return
statement in a void
method ends the method. It is also possible to write the opposite of the condition (low < high
) and write the rest of the method as the body of the conditional statement.
Step 3 rest of recursive sort
method
When writing recursive methods, we assume that the method we are currently writing already works, as long as we call it with something that approaches the base case. We assume that sort(int[], int, int)
works as long as we call it with fewer elements.
We assume that merge
works because we were told to.
The postcondition of merge
is identical to the postcondition of sort(int[], int, int)
. If we could satisfy the precondition of merge
, we could call it and accomplish the task of sorting arr[low] ... arr[high]
.
The preconditon of merge is arr[low] ... arr[mid]
is sorted and arr[mid + 1] ... arr[high]
is sorted. We have a method that sorts parts of an array (the method we are currently writing).
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
// base case
if(low >= high)
return;
int mid = (low + high) / 2;
sort(arr, low, mid);
sort(arr, mid + 1, high);
merge(arr, low, mid, high);
}
The recursive calls to sort
approach the base case because each call has fewer elements than the original range. (The base case is 0 or 1 elements.) The call to merge
works because its precondition is met. The postcondition for merge
is the same as the postcondition for sort(int[], int, int)
, so we’re done.
The public sort
method
sort(int[], int, int)
is a private recursive helper method. Recursive helper methods are used when different parameters are desired than those taken by the public method. The public method sort(int[])
calls the private method sort(int[], int, int)
.
// postcondition: arr[0] ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
sort(arr, 0, arr.length - 1);
}
The merge
method
For the derivation above, we assume the merge
method works as specified by its precondition. This assumption allows us to focus on the interesting sort
method.
Below, we trace the merge
method for a single step.
merge
method simple trace
Step 1
arr -> [36 71 78 79 86 11 24 35 75 77]
0 1 2 3 4 5 6 7 8 9
low: 0
mid: 4
high: 9
The precondition has been met. arr[0] ... arr[4]
is sorted and arr[5] ... arr[9]
is sorted.
For this trace, low
and high
are the lowest and highest indexes of arr
, respectively. The approach is exactly the same when working with a smaller range of elements. We would just ignore all elements outside arr[low] ... arr[high]
.
Step 2
A simple implementation of merge
copies both sorted halves into temporary arrays. We maintain 3 indexes.
arr -> [36 71 78 79 86 11 24 35 75 77]
^
arrIndex
firstHalf -> [36 71 78 79 86]
^
firstIndex
secondHalf -> [11 24 35 75 77]
^
secondIndex
Step 3
We find the smaller of firstHalf[firstIndex]
and secondHalf[secondIndex]
. We copy the smaller element to arr[arrIndex]
, overwriting the existing value. (We favor firstHalf[firstIndex]
if firstHalf[firstIndex] == secondHalf[secondIndex]
.)
We increment arrIndex
and the index of the array from which a value was selected. In this case, we increment arrIndex
and secondIndex
.
arr -> [11 71 78 79 86 11 24 35 75 77]
^
arrIndex
firstHalf -> [36 71 78 79 86]
^
firstIndex
secondHalf -> [11 24 35 75 77]
^
secondIndex
Step 4
We repeat the process of selecting the smaller of firstHalf[firstIndex]
and secondHalf[secondIndex]
until we run out of elements in firstHalf
or secondHalf
.
arr -> [11 24 35 36 71 75 77 35 75 77]
^
arrIndex
firstHalf -> [36 71 78 79 86]
^
firstIndex
secondHalf -> [11 24 35 75 77]
^
secondIndex
Step 5
We copy the remaining elements from the half that still has elements into arr
. In this case, we copy the remaining elements from firstHalf
. There are no further comparisons. The remaining elements are simply copied into arr
starting at arrIndex
.
arr -> [11 24 35 36 71 75 77 78 79 86]
^
arrIndex
firstHalf -> [36 71 78 79 86]
^
firstIndex
secondHalf -> [11 24 35 75 77]
^
secondIndex
merge
method simple implementation
The implementation below uses the easily understood approach of copying both halves into temporary arrays. This is the approach shown in the trace above.
In the trace above, low
and high
were the lowest and highest indexes of arr
, respectively. The implementation below does not assume this. It works with all valid values of low
and high
.
// precondition: arr[low] ... arr[mid] is sorted &&
// arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
int[] firstHalf = new int[mid - low + 1];
for(int i = 0; i < firstHalf.length; i++)
firstHalf[i] = arr[low + i];
int[] secondHalf = new int[high - mid];
for(int i = 0; i < secondHalf.length; i++)
secondHalf[i] = arr[mid + 1 + i];
int arrIndex = low;
int firstIndex = 0;
int secondIndex = 0;
while(firstIndex < firstHalf.length && secondIndex < secondHalf.length)
{
if(firstHalf[firstIndex] <= secondHalf[secondIndex])
{
arr[arrIndex] = firstHalf[firstIndex];
firstIndex++;
}
else
{
arr[arrIndex] = secondHalf[secondIndex];
secondIndex++;
}
arrIndex++;
}
while(firstIndex < firstHalf.length)
{
arr[arrIndex] = firstHalf[firstIndex];
firstIndex++;
arrIndex++;
}
while(secondIndex < secondHalf.length)
{
arr[arrIndex] = secondHalf[secondIndex];
secondIndex++;
arrIndex++;
}
}
It is also possible to write merge
with a single while
loop. The conditional statements would be modified to check if the indexes are valid.
Merge sort recursive implementation
In the implementation below, both sort
methods are identical to those presented earlier. The merge
method has been improved to copy only half of arr[low] ... arr[high]
into a temporary array.
When I have time, I’ll post a trace of the improved merge
method. Briefly, it copies only the first half into a temporary array, because we won’t run into where we are copying from in the second half anyway. This also offers the advantage that if we run out of elements in the first half, the ones in the second half are already where we want them.
// postcondition: arr[0] ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
sort(arr, 0, arr.length - 1);
}
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
if(low >= high)
return;
int mid = (low + high) / 2;
sort(arr, low, mid);
sort(arr, mid + 1, high);
merge(arr, low, mid, high);
}
// precondition: arr[low] ... arr[mid] is sorted &&
// arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
// elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
int[] b = new int[mid + 1 - low];
for(int i = 0; i < b.length; i++)
b[i] = arr[low + i];
int aLowerIndex = low;
int bIndex = 0;
int aHigherIndex = mid + 1;
while (aLowerIndex < aHigherIndex && aHigherIndex <= high)
{
if (arr[aHigherIndex] < b[bIndex])
{
arr[aLowerIndex] = arr[aHigherIndex];
aHigherIndex++;
}
else
{
arr[aLowerIndex] = b[bIndex];
bIndex++;
}
aLowerIndex++;
}
while (aLowerIndex < aHigherIndex)
{
arr[aLowerIndex] = b[bIndex];
aLowerIndex++;
bIndex++;
}
}