sort method derivation from method headers

I present merge sort to students much differently than I present selection sort and insertion sort. Instead of tracing the algorithm first, I ask students to derive the merge sort algorithm from the method headers.

Step 1 method headers

The method headers for a recursive implementation of merge sort are below. Preconditions and postconditions are included.

// postcondition: arr[0] ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
    
}

// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
    // This method is recursive.
}

// precondition: arr[low] ... arr[mid] is sorted &&
//               arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
    // Assume, for now, that this method already works.
}

Step 2 recursive sort method base case

sort(int[], int, int) is recursive. When writing recursive methods, we start with a base case (or possibly base cases). The method is intended to sort a range of values in arr. If there is only 1 value (low == high), we return (stop) since a single value is already sorted. If there are 0 values (low > high), we return for the same reason.

// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
    // base case
    if(low >= high)
        return;
}

A return statement in a void method ends the method. It is also possible to write the opposite of the condition (low < high) and write the rest of the method as the body of the conditional statement.

Step 3 rest of recursive sort method

When writing recursive methods, we assume that the method we are currently writing already works, as long as we call it with something that approaches the base case. We assume that sort(int[], int, int) works as long as we call it with fewer elements.

We assume that merge works because we were told to.

The postcondition of merge is identical to the postcondition of sort(int[], int, int). If we could satisfy the precondition of merge, we could call it and accomplish the task of sorting arr[low] ... arr[high].

The preconditon of merge is arr[low] ... arr[mid] is sorted and arr[mid + 1] ... arr[high] is sorted. We have a method that sorts parts of an array (the method we are currently writing).

// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
    // base case
    if(low >= high)
        return;
    
    int mid = (low + high) / 2;

    sort(arr, low, mid);
    sort(arr, mid + 1, high);
    merge(arr, low, mid, high);
}

The recursive calls to sort approach the base case because each call has fewer elements than the original range. (The base case is 0 or 1 elements.) The call to merge works because its precondition is met. The postcondition for merge is the same as the postcondition for sort(int[], int, int), so we’re done.

The public sort method

sort(int[], int, int) is a private recursive helper method. Recursive helper methods are used when different parameters are desired than those taken by the public method. The public method sort(int[]) calls the private method sort(int[], int, int).

// postcondition: arr[0] ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
    sort(arr, 0, arr.length - 1);
}

The merge method

For the derivation above, we assume the merge method works as specified by its precondition. This assumption allows us to focus on the interesting sort method.

Below, we trace the merge method for a single step.

merge method simple trace

Step 1

arr -> [36 71 78 79 86 11 24 35 75 77]
        0  1  2  3  4  5  6  7  8  9

low: 0
mid: 4
high: 9

The precondition has been met. arr[0] ... arr[4] is sorted and arr[5] ... arr[9] is sorted.

For this trace, low and high are the lowest and highest indexes of arr, respectively. The approach is exactly the same when working with a smaller range of elements. We would just ignore all elements outside arr[low] ... arr[high].

Step 2

A simple implementation of merge copies both sorted halves into temporary arrays. We maintain 3 indexes.

arr ->        [36 71 78 79 86 11 24 35 75 77]
               ^
               arrIndex

firstHalf ->  [36 71 78 79 86]
               ^
              firstIndex

secondHalf -> [11 24 35 75 77]
               ^
               secondIndex

Step 3

We find the smaller of firstHalf[firstIndex] and secondHalf[secondIndex]. We copy the smaller element to arr[arrIndex], overwriting the existing value. (We favor firstHalf[firstIndex] if firstHalf[firstIndex] == secondHalf[secondIndex].)

We increment arrIndex and the index of the array from which a value was selected. In this case, we increment arrIndex and secondIndex.

arr ->        [11 71 78 79 86 11 24 35 75 77]
                  ^
                  arrIndex

firstHalf ->  [36 71 78 79 86]
               ^
               firstIndex

secondHalf -> [11 24 35 75 77]
                  ^
                  secondIndex

Step 4

We repeat the process of selecting the smaller of firstHalf[firstIndex] and secondHalf[secondIndex] until we run out of elements in firstHalf or secondHalf.

arr ->        [11 24 35 36 71 75 77 35 75 77]
                                    ^
                                    arrIndex

firstHalf ->  [36 71 78 79 86]
                     ^
               firstIndex

secondHalf -> [11 24 35 75 77]
                              ^
                              secondIndex

Step 5

We copy the remaining elements from the half that still has elements into arr. In this case, we copy the remaining elements from firstHalf. There are no further comparisons. The remaining elements are simply copied into arr starting at arrIndex.

arr ->        [11 24 35 36 71 75 77 78 79 86]
                                             ^
                                             arrIndex

firstHalf ->  [36 71 78 79 86]
                              ^
                              firstIndex

secondHalf -> [11 24 35 75 77]
                              ^
                              secondIndex

merge method simple implementation

The implementation below uses the easily understood approach of copying both halves into temporary arrays. This is the approach shown in the trace above.

In the trace above, low and high were the lowest and highest indexes of arr, respectively. The implementation below does not assume this. It works with all valid values of low and high.

// precondition: arr[low] ... arr[mid] is sorted &&
//               arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
    int[] firstHalf = new int[mid - low + 1];
    for(int i = 0; i < firstHalf.length; i++)
        firstHalf[i] = arr[low + i];
    
    int[] secondHalf = new int[high - mid];
    for(int i = 0; i < secondHalf.length; i++)
        secondHalf[i] = arr[mid + 1 + i];
    
    int arrIndex = low;
    int firstIndex = 0;
    int secondIndex = 0;
    
    while(firstIndex < firstHalf.length && secondIndex < secondHalf.length)
    {
        if(firstHalf[firstIndex] <= secondHalf[secondIndex])
        {
            arr[arrIndex] = firstHalf[firstIndex];
            firstIndex++;
        }
        else
        {
            arr[arrIndex] = secondHalf[secondIndex];
            secondIndex++;
        }
        
        arrIndex++;
    }
    
    while(firstIndex < firstHalf.length)
    {
        arr[arrIndex] = firstHalf[firstIndex];
        firstIndex++;
        arrIndex++;
    }
    
    while(secondIndex < secondHalf.length)
    {
        arr[arrIndex] = secondHalf[secondIndex];
        secondIndex++;
        arrIndex++;
    }
}

It is also possible to write merge with a single while loop. The conditional statements would be modified to check if the indexes are valid.

Merge sort recursive implementation

In the implementation below, both sort methods are identical to those presented earlier. The merge method has been improved to copy only half of arr[low] ... arr[high] into a temporary array.

When I have time, I’ll post a trace of the improved merge method. Briefly, it copies only the first half into a temporary array, because we won’t run into where we are copying from in the second half anyway. This also offers the advantage that if we run out of elements in the first half, the ones in the second half are already where we want them.

// postcondition: arr[0] ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
    sort(arr, 0, arr.length - 1);
}

// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
    if(low >= high)
        return;
    
    int mid = (low + high) / 2;

    sort(arr, low, mid);
    sort(arr, mid + 1, high);
    merge(arr, low, mid, high);
}

// precondition: arr[low] ... arr[mid] is sorted &&
//               arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
    int[] b = new int[mid + 1 - low];
    for(int i = 0; i < b.length; i++)
    	b[i] = arr[low + i];

    int aLowerIndex = low;
    int bIndex = 0;
    int aHigherIndex = mid + 1;
    
    while (aLowerIndex < aHigherIndex && aHigherIndex <= high)
    {
        if (arr[aHigherIndex] < b[bIndex])
        {
            arr[aLowerIndex] = arr[aHigherIndex];
            aHigherIndex++;
        }
        else
        {
            arr[aLowerIndex] = b[bIndex];
            bIndex++;
        }
        
        aLowerIndex++;
    }

    while (aLowerIndex < aHigherIndex)
    {
        arr[aLowerIndex] = b[bIndex];
        aLowerIndex++;
        bIndex++;
    }
}

AP CS A Exam sort & search algorithms

Comments

Comment on Merge sort