CS 151 - Introduction to Data Structures

Quicksort

Quicksort is undoubtedly quick. It is generally regarded as the fastest sorting algorithm. Over the years there have been many approaches to making quicksort even faster. In this homework you will implement quicksort and one of the methods for improving the speed of quicksort. Then you will test how much faster the improved version is than basic quicksort.

Tasks (probably best done in this order):
  1. Copy from /home/gtowell/Public/151/A07 the following code: SortBase.java and Selection.java. Selection.java implements selection sort. It is included only as an example of the use of the SortBase class. Mostly what SortBase gives you is a function for generating a randomized list of integers.
  2. Write a recursive implementation of Quicksort that is consistent with the pseudocode presented in class. (Note that your implementation need only be recursive at the "top level". That is, the split function can use loops.) This code should be extensively documented .. perhaps to the point of a "//" style comment per line explaining exactly what the line does.
  3. Implement Insertion sort. The implementation should be consistent with the pseudocode from Nov 11 rather than the code from class on Nov 9. For the problem below, the Nov 11 pseudocode it will be considerably faster than the Nov 9 version.
  4. Time your quicksort implementation for sorting arrays of 10,000 up to 10,240,000 items. If your computer bogs down on 10 million then stop early. Increment sizes by a factor of 4. (10000, 40000, 160000, 640000, ...). Because you are dealing with randomness, at each step, record the average over at least 5 trials (and preferably more than 20). (Suggestions: use a spreadsheet to record your data; use loops in you testing code so you can run all 5 (or 20) tests with a single command. Then calculate the average within your program.) This will make the data recording task a lot easier.
  5. Adapt your quicksort algorithm to stop recurring when the size of the portion of the array being recurred upon falls below some number M. For eaxmple, if M=3, then the following might be the before and after for an array on which quicksort was called
    		original data           10 5 3 2 6 7 1 9 8 4
    
    		partioion on 4         1 3 2   4   6 5 9 10 8
    
    		lower set "done" 
    		partition upper on 8             6 5   8   9 10
    		
    		upper set now "done"
    		result                    1 3 2 4 6 5 8 9 10
    	
    Note that in the lower set of numbers the numbers within group of size M (or less) may not be sorted, but that they are always less than the numbers in the M sized groups to the right.

    After the quicksort completes, pass the resulting semi-sorted array to Insertion sort. This algorithm is often called Hybrid Quicksort (because it is a hybrid of insertion sort and quicksort).
  6. Collect times for your Hybrid Quicksort that exactly mirror those you collected for standard quicksort. With any luck your times will be faster for hybrid than for standard. Your times for hybrid should certainly NOT grow according to O(N*N) that you might expect from insertion sort. You may only see a difference at the largest sort sizes. For my implementation, the speedup is about 10%. Some of you may have a slightly negative improvement. Hybrid should certainly not be a lot worse that standard quicksort.
  7. In your readme file include a table of you timing runs for standard and hybrid quicksort similar to the tables presented in class.
  8. In your readme file include a paragraph explaining why hybrid quicksort is expected to be faster than quicksort (even if your data suggests no difference). If you cannot figure out a good answer to this question, at least answer a simpler question, why does calling insertion sort after stopping quicksort early not simply result in O(n*n) time spent in sorting.

What to Turn In

  1. All code your wrote or edited, well commented
  2. README with the usual information. This file should follow the format of this sample README (https://cs.brynmawr.edu/cs151/README.txt) Within the README file, include a brief (max 1 paragraph) reflection on what went well and/or poorly and why.
  3. Either in your README file or separately, a table showing timing of quicksort and hybrid quicksort as requested above.
  4. Either in your README or a separate file, one paragraph (approximately) on why hybrid quicksort works in less han O(n*n) time.
  5. A "script" file (see assignment 4, and lab 3, Sep 14 ) that shows the output of quicksort in the following 4 situations (you may have separate script files if that is more convenient):

Electronic Submissions

Your program will be graded based on how it runs on the department’s Linux server, not how it runs on your computer. The submission should include the following items: