cbourke/Sorting.java

## searchingSorting-H.md

      
    Raw
  

              searchingSorting-H.md
            
          
    CSCE 155H - Computer Science I - Honors

Fall 2018

Searching & Sorting


Processing data is a fundamental operation in Computer Science
Two fundamental operations in processing data are searching and sorting
Form the basis or preprocessing step of many algorithms
Large variety of algorithms and algorithm strategies

Searching


What are you searching?  In general a "collection" (array, set, map, etc.), what are you searching for?  strings, numbers, files, objects, etc.

How are you searching?  (not an algorithm, but a criterion or criteria)

You could search for the first, last, or all instances of a particular criteria or key elements, $k$

You could search for extremal elements: minimum, maximum, median, or in general the $i$-th order statistic


For simplicity: we'll start by searching an array of integers for a particular one, $k$


Linear Search


Given a collection (array) of integers and a key element $k$, we want to search through the collection and find the first element that "matches" $k$ (ie "equals" $k$)
To keep things abstract and general, we will develop pseudocode to solve this problem
Pseudocode: see book
Basic Idea: iterate through the collection until you find the element you are looking for
Exercise: translate our pseudocode to C

/**
 * This function searches the given array for
 * the given key and returns the index at which
 * it finds it.
 *
 * It returns -1 if no such element exists.
 */
int linearSearch(const int *a, int n, int key) {
  for(int i=0; i<n; i++) {
    if(a[i] == key) {
      return i;
    }
  }
  return -1;
}

Problem: the above function is only good for arrays of integers.  What if you wanted to search arrays of:

Strings
doubles
a string (array of characters)


Goal (eventually): write ONE search function/algorithm that can be used on any type of variable!

Binary Search


How fast is linear search?
Can we improve it?  Is there a faster search?
Suppose that the input array is already sorted: how might you exploit this structure?
You can exploit the sorted structure by looking at the middle element:

Either you've found your needle
Or you know that you needle (if it exists) is in the lower half or
it is in the upper half


Each iteration/comparison cuts the size of the array in half each time
Classic "divide and conquer" style algorithmic approach which may be implemented recursively
This turns out to be much better than linear search

Analysis


Suppose we have an array of $n$ elements
How many comparisons does linear search perform to search for a particular key element $k$?

Best case scenario: you make 1 comparison, finding $k$ at index 0
Worst case scenario: you make $n$ comparisons and find it at the end (index $n-1$) or your search is unsuccessful
Average case scenario: beyond this course, but it leads to the same answer as the "naive" approach:
$$\frac{n+1}{2} \approx \frac{n}{2}$$

Worst versus average case: $n$ vs $\frac{n}{2}$, average case is "better", but in the grand scheme of things, both of them are linear, both have the "same" rate of growth


Binary Search:

How many comparisons in the worst case scenario do you make?

Derivation: whiteboard or book
It only takes $\log_2{(n)}$ comparisons


Perspective


Suppose you have a database of $10^{12}$ (1 trillion elements)

If unsorted, you have to search the database using linear search
You end up having to make (on average)
$$\frac{n}{2} = \frac{10^{12}}{2}$$

ie 500 billion comparisons
Suppose the database is "sorted" (indexed): you can then use binary search to search for a record:
$$\log_2{(10^{12})} = 12 \cdot \log_2{(10)} &lt; 40$$

Binary search: less than 40 comparisons


Another perspective: suppose that we "double" the size of the array or data being searched


How much worse or slower does each algorithm become?

Linear search: $n \rightarrow 2n$ it doubles the number of comparisons needed to search
Binary search: $n \rightarrow 2n$ you only need 1 additional comparison to search!


Take away: binary search is exponentially faster!


Sorting

Selection Sort


Basic Idea: you search through the array and find the minimal element, then swap it with the first element

Repeat, you find the next smallest element
and swap it with the second element
In general on the $i$-th iteration, the first $i-1$ elements have been sorted: search elements $a_{i}$ thru $a_{n}$ and find the smallest element, swap it with element $a_{i}$


Analysis


How many comparisons does selection sort make on an array of size $n$?
Whiteboard for analysis: setup a summation of how many comparisons the algorithm makes $\approx n^2$ comparisons
Selection sort is a (slow) quadratic sorting algorithm
Not a reasonable algorithm for even moderately large data sets

Quick Sort


Basic Idea:

Choose a pivot element in the array
Partition all elements around the pivot: place smaller elements to the left, larger elements to the right
Repeat by recursing on the left partition and on the right partition until...
you have either an empty array or an array of size 1
Examples and code can be found in the book


Analysis:

Worst case: $\approx n^2$ comparisons (if it were already sorted or you unluckily chose an extremal element as the pivot each time) - it essentially degrades into selection sort
Base/Average: $\approx n\log_2{(n)}$ (quasilinear performance)


Comparison


Selection sort: $n^2$ vs Quick Sort $n\log{(n)}$


$n\log{n}$ grows much smaller than $n^2$


Consider doubling the input size:

Selection sort: $n \rightarrow 2n$:
$$n^2 \rightarrow 4n^2$$

Doubling the input size makes it FOUR times slower!
Quick sort: $n \rightarrow 2n$:
$$n\log{n} \rightarrow \approx 2n\log{n}$$

Only doubles the number of comparisons


Consider sorting large arrays $10^{12}$


$10^{12}\times 10^{12}$ (1 septillion) operations for selection sort

$10^{12} \log{(10^{12})} &lt; 40 * 10^{12}$ for quicksort


Searching & Sorting in Practice


In practice, you don't roll your own searching/sorting algorithms/implementations unless you have a Very Good Reason to do so
In practice you use built-in searching & sorting functionality
But: you want to avoid multiple solutions, you want write generic code
Want a single function that is able to sort any and all types of data


A sorting algorithm only needs to know how to "sort", not how to "order"
Intead, a comparator is provided to the sorting function that is responsible for ordering
A comparator has the following general contract:
A comparator takes two elements $a, b$ as input and returns

something negative if $a &lt; b$

zero if $a = b$

something positive if $a &gt; b$


Searching & Sorting in C

Comparator Functions


In C, a comparator function has the following signature:

int cmp(const void *a, const void *b)

A void * is a generic void pointer that can point to any type of data
recall that malloc returns a void * (generic void pointer)
How to implement a comparator: standard pattern

Cast the pointers to a specific type of data you want to compare
Use the state of the object or structure to determine the proper ordering
You return a valid integer value that expresses that order


Best Practices:

Use descriptive names for your functions
Be explicit in your comparisons
Avoid "tricks"
Reuse comparator functionality when appropriate!


Searching & Sorting in C


Once you have a comparator, how do you use it?


The standard C library provides a function:


void qsort(void *base,
           size_t nel,
           size_t size,
           int (*compar)(const void *, const void *));

base is the array of elements to be sorted
nel is the number of elements in the array
size is the number of bytes each element in the array takes, ie you use sizeof()
compar - ?

Function Pointers


How can we "pass" a function to another function as an argument
Generally these are referred to as "callbacks"
Ex: GUIs: you can define a button, but how do you specify what happens when it is clicked: register a callback
In C, pointers can point to data stored in memory
What else is stored in memory?  Your code!
It makes sense that you can have a pointer point to a function
Sytnax:

//create a pointer that can point to a function that
//returns an integer and takes 3 inputs:
//int, double, char
int (*ptrToFunc)(int, double, char) = NULL;

//declare a function pointer that can point to sqrt
double (*ptrToSqrt)(double) = NULL;

//make ptrToSqrt "point" to sqrt:
ptrToSqrt = sqrt;

double y = ptrToSqrt(2.0);//1.41..

//likely the compiler will not let you, but you could try:
sqrt = sin;

ptrToSqrt = sin;

double y = ptrToSqrt(2.0);//.909 (sin(2))
Searching & Sorting in Java


Java does not have pointers, so it does not have function pointers!
Instead, you define Comparator<> objects that have the same basic contract:
Each compatator has a single method: compare() which returns an integer and takes 2 elements of the type it is parameterized for


## Sorting.java
package cse.unl;

import java.time.LocalDate;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class Sorting {


	public static final Comparator<Integer> cmpInt = new Comparator<Integer>() {

		@Override
		public int compare(Integer a, Integer b) {
			return a.compareTo(b);
		}

	};

	public static final Comparator<Integer> cmpIntDesc = new Comparator<Integer>() {

		@Override
		public int compare(Integer a, Integer b) {
			return b.compareTo(a);
		}

	};

	public static final Comparator<Student> cmpStudentByName = new Comparator<Student>() {

		@Override
		public int compare(Student s1, Student s2) {
			int result = s1.getLastName().compareTo(s2.getLastName());
			if(result != 0) {
				return result;
			} else {
				//look at their first name:
				result = s1.getFirstName().compareTo(s2.getFirstName());
				if(result != 0) {
					return result;
				} else {
					return s1.getDateOfBirth().compareTo(s2.getDateOfBirth());
				}
			}

		}

	};


	public static void main(String args[]) {


		List<Integer> numbers = Arrays.asList(6, 3, 9, 4, 1, 8, 3, 12, 2, 21);

		System.out.println(numbers);
		Collections.sort(numbers, cmpIntDesc);
		//TODO: sort
		System.out.println(numbers);

		List<Student> roster = new ArrayList<Student>();
		LocalDate dateOfBirth = LocalDate.of(1990, 7, 9);
		roster.add(new Student(35140602, "Chris", "Bourke", 4.0, dateOfBirth));
		roster.add(new Student(1234, "Scott", "Frost", 2.6, dateOfBirth));
		roster.add(new Student(1212, "Kyle", "Schwarber", 1.2, dateOfBirth));
		roster.add(new Student(44, "Anthony", "Rizzo", 4.4, dateOfBirth));
		for(Student s : roster) {
			System.out.println(s);
		}
		//TODO: sort
	}

}
	package cse.unl;

	import java.time.LocalDate;
	import java.util.ArrayList;
	import java.util.Arrays;
	import java.util.Collections;
	import java.util.Comparator;
	import java.util.List;

	public class Sorting {


	public static final Comparator<Integer> cmpInt = new Comparator<Integer>() {

	@Override
	public int compare(Integer a, Integer b) {
	return a.compareTo(b);
	}

	};

	public static final Comparator<Integer> cmpIntDesc = new Comparator<Integer>() {

	@Override
	public int compare(Integer a, Integer b) {
	return b.compareTo(a);
	}

	};

	public static final Comparator<Student> cmpStudentByName = new Comparator<Student>() {

	@Override
	public int compare(Student s1, Student s2) {
	int result = s1.getLastName().compareTo(s2.getLastName());
	if(result != 0) {
	return result;
	} else {
	//look at their first name:
	result = s1.getFirstName().compareTo(s2.getFirstName());
	if(result != 0) {
	return result;
	} else {
	return s1.getDateOfBirth().compareTo(s2.getDateOfBirth());
	}
	}

	}

	};


	public static void main(String args[]) {


	List<Integer> numbers = Arrays.asList(6, 3, 9, 4, 1, 8, 3, 12, 2, 21);

	System.out.println(numbers);
	Collections.sort(numbers, cmpIntDesc);
	//TODO: sort
	System.out.println(numbers);

	List<Student> roster = new ArrayList<Student>();
	LocalDate dateOfBirth = LocalDate.of(1990, 7, 9);
	roster.add(new Student(35140602, "Chris", "Bourke", 4.0, dateOfBirth));
	roster.add(new Student(1234, "Scott", "Frost", 2.6, dateOfBirth));
	roster.add(new Student(1212, "Kyle", "Schwarber", 1.2, dateOfBirth));
	roster.add(new Student(44, "Anthony", "Rizzo", 4.4, dateOfBirth));
	for(Student s : roster) {
	System.out.println(s);
	}
	//TODO: sort
	}

	}
No results found