Skip to content

Instantly share code, notes, and snippets.

@cbourke
Created November 14, 2018 16:00
Show Gist options
  • Select an option

  • Save cbourke/b66c29efe568a13ae5d711227382ff97 to your computer and use it in GitHub Desktop.

Select an option

Save cbourke/b66c29efe568a13ae5d711227382ff97 to your computer and use it in GitHub Desktop.
auto-posted gist

CSCE 155H - Computer Science I - Honors

Fall 2018

Searching & Sorting

  • Processing data is a fundamental operation in Computer Science
  • Two fundamental operations in processing data are searching and sorting
  • Form the basis or preprocessing step of many algorithms
  • Large variety of algorithms and algorithm strategies

Searching

  • What are you searching? In general a "collection" (array, set, map, etc.), what are you searching for? strings, numbers, files, objects, etc.
  • How are you searching? (not an algorithm, but a criterion or criteria)
    • You could search for the first, last, or all instances of a particular criteria or key elements, $k$
    • You could search for extremal elements: minimum, maximum, median, or in general the $i$-th order statistic
  • For simplicity: we'll start by searching an array of integers for a particular one, $k$

Linear Search

  • Given a collection (array) of integers and a key element $k$, we want to search through the collection and find the first element that "matches" $k$ (ie "equals" $k$)
  • To keep things abstract and general, we will develop pseudocode to solve this problem
  • Pseudocode: see book
  • Basic Idea: iterate through the collection until you find the element you are looking for
  • Exercise: translate our pseudocode to C
/**
 * This function searches the given array for
 * the given key and returns the index at which
 * it finds it.
 *
 * It returns -1 if no such element exists.
 */
int linearSearch(const int *a, int n, int key) {
  for(int i=0; i<n; i++) {
    if(a[i] == key) {
      return i;
    }
  }
  return -1;
}
  • Problem: the above function is only good for arrays of integers. What if you wanted to search arrays of:
    • Strings
    • doubles
    • a string (array of characters)
  • Goal (eventually): write ONE search function/algorithm that can be used on any type of variable!

Binary Search

  • How fast is linear search?
  • Can we improve it? Is there a faster search?
  • Suppose that the input array is already sorted: how might you exploit this structure?
  • You can exploit the sorted structure by looking at the middle element:
    • Either you've found your needle
    • Or you know that you needle (if it exists) is in the lower half or
    • it is in the upper half
  • Each iteration/comparison cuts the size of the array in half each time
  • Classic "divide and conquer" style algorithmic approach which may be implemented recursively
  • This turns out to be much better than linear search

Analysis

  • Suppose we have an array of $n$ elements
  • How many comparisons does linear search perform to search for a particular key element $k$?
    • Best case scenario: you make 1 comparison, finding $k$ at index 0
    • Worst case scenario: you make $n$ comparisons and find it at the end (index $n-1$) or your search is unsuccessful
    • Average case scenario: beyond this course, but it leads to the same answer as the "naive" approach: $$\frac{n+1}{2} \approx \frac{n}{2}$$
    • Worst versus average case: $n$ vs $\frac{n}{2}$, average case is "better", but in the grand scheme of things, both of them are linear, both have the "same" rate of growth
  • Binary Search:
    • How many comparisons in the worst case scenario do you make?
      • Derivation: whiteboard or book
      • It only takes $\log_2{(n)}$ comparisons

Perspective

  • Suppose you have a database of $10^{12}$ (1 trillion elements)

    • If unsorted, you have to search the database using linear search
    • You end up having to make (on average) $$\frac{n}{2} = \frac{10^{12}}{2}$$
    • ie 500 billion comparisons
    • Suppose the database is "sorted" (indexed): you can then use binary search to search for a record: $$\log_2{(10^{12})} = 12 \cdot \log_2{(10)} &lt; 40$$
    • Binary search: less than 40 comparisons
  • Another perspective: suppose that we "double" the size of the array or data being searched

  • How much worse or slower does each algorithm become?

    • Linear search: $n \rightarrow 2n$ it doubles the number of comparisons needed to search
    • Binary search: $n \rightarrow 2n$ you only need 1 additional comparison to search!
  • Take away: binary search is exponentially faster!

Sorting

Selection Sort

  • Basic Idea: you search through the array and find the minimal element, then swap it with the first element
    • Repeat, you find the next smallest element and swap it with the second element
    • In general on the $i$-th iteration, the first $i-1$ elements have been sorted: search elements $a_{i}$ thru $a_{n}$ and find the smallest element, swap it with element $a_{i}$

Analysis

  • How many comparisons does selection sort make on an array of size $n$?
  • Whiteboard for analysis: setup a summation of how many comparisons the algorithm makes $\approx n^2$ comparisons
  • Selection sort is a (slow) quadratic sorting algorithm
  • Not a reasonable algorithm for even moderately large data sets

Quick Sort

  • Basic Idea:
    • Choose a pivot element in the array
    • Partition all elements around the pivot: place smaller elements to the left, larger elements to the right
    • Repeat by recursing on the left partition and on the right partition until...
    • you have either an empty array or an array of size 1
    • Examples and code can be found in the book
  • Analysis:
    • Worst case: $\approx n^2$ comparisons (if it were already sorted or you unluckily chose an extremal element as the pivot each time) - it essentially degrades into selection sort
    • Base/Average: $\approx n\log_2{(n)}$ (quasilinear performance)

Comparison

  • Selection sort: $n^2$ vs Quick Sort $n\log{(n)}$

  • $n\log{n}$ grows much smaller than $n^2$

  • Consider doubling the input size:

    • Selection sort: $n \rightarrow 2n$: $$n^2 \rightarrow 4n^2$$
    • Doubling the input size makes it FOUR times slower!
    • Quick sort: $n \rightarrow 2n$: $$n\log{n} \rightarrow \approx 2n\log{n}$$
    • Only doubles the number of comparisons
  • Consider sorting large arrays $10^{12}$

    • $10^{12}\times 10^{12}$ (1 septillion) operations for selection sort
    • $10^{12} \log{(10^{12})} &lt; 40 * 10^{12}$ for quicksort

Searching & Sorting in Practice

  • In practice, you don't roll your own searching/sorting algorithms/implementations unless you have a Very Good Reason to do so
  • In practice you use built-in searching & sorting functionality
  • But: you want to avoid multiple solutions, you want write generic code
  • Want a single function that is able to sort any and all types of data

sorting

  • A sorting algorithm only needs to know how to "sort", not how to "order"
  • Intead, a comparator is provided to the sorting function that is responsible for ordering
  • A comparator has the following general contract:
  • A comparator takes two elements $a, b$ as input and returns
    • something negative if $a &lt; b$
    • zero if $a = b$
    • something positive if $a &gt; b$

Searching & Sorting in C

Comparator Functions

  • In C, a comparator function has the following signature:

int cmp(const void *a, const void *b)

  • A void * is a generic void pointer that can point to any type of data
  • recall that malloc returns a void * (generic void pointer)
  • How to implement a comparator: standard pattern
    1. Cast the pointers to a specific type of data you want to compare
    2. Use the state of the object or structure to determine the proper ordering
    3. You return a valid integer value that expresses that order
  • Best Practices:
    • Use descriptive names for your functions
    • Be explicit in your comparisons
    • Avoid "tricks"
    • Reuse comparator functionality when appropriate!

Searching & Sorting in C

  • Once you have a comparator, how do you use it?

  • The standard C library provides a function:

void qsort(void *base,
           size_t nel,
           size_t size,
           int (*compar)(const void *, const void *));
  • base is the array of elements to be sorted
  • nel is the number of elements in the array
  • size is the number of bytes each element in the array takes, ie you use sizeof()
  • compar - ?

Function Pointers

  • How can we "pass" a function to another function as an argument
  • Generally these are referred to as "callbacks"
  • Ex: GUIs: you can define a button, but how do you specify what happens when it is clicked: register a callback
  • In C, pointers can point to data stored in memory
  • What else is stored in memory? Your code!
  • It makes sense that you can have a pointer point to a function
  • Sytnax:
//create a pointer that can point to a function that
//returns an integer and takes 3 inputs:
//int, double, char
int (*ptrToFunc)(int, double, char) = NULL;

//declare a function pointer that can point to sqrt
double (*ptrToSqrt)(double) = NULL;

//make ptrToSqrt "point" to sqrt:
ptrToSqrt = sqrt;

double y = ptrToSqrt(2.0);//1.41..

//likely the compiler will not let you, but you could try:
sqrt = sin;

ptrToSqrt = sin;

double y = ptrToSqrt(2.0);//.909 (sin(2))

Searching & Sorting in Java

  • Java does not have pointers, so it does not have function pointers!
  • Instead, you define Comparator<> objects that have the same basic contract:
  • Each compatator has a single method: compare() which returns an integer and takes 2 elements of the type it is parameterized for
package cse.unl;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
public class Sorting {
public static final Comparator<Integer> cmpInt = new Comparator<Integer>() {
@Override
public int compare(Integer a, Integer b) {
return a.compareTo(b);
}
};
public static final Comparator<Integer> cmpIntDesc = new Comparator<Integer>() {
@Override
public int compare(Integer a, Integer b) {
return b.compareTo(a);
}
};
public static final Comparator<Student> cmpStudentByName = new Comparator<Student>() {
@Override
public int compare(Student s1, Student s2) {
int result = s1.getLastName().compareTo(s2.getLastName());
if(result != 0) {
return result;
} else {
//look at their first name:
result = s1.getFirstName().compareTo(s2.getFirstName());
if(result != 0) {
return result;
} else {
return s1.getDateOfBirth().compareTo(s2.getDateOfBirth());
}
}
}
};
public static void main(String args[]) {
List<Integer> numbers = Arrays.asList(6, 3, 9, 4, 1, 8, 3, 12, 2, 21);
System.out.println(numbers);
Collections.sort(numbers, cmpIntDesc);
//TODO: sort
System.out.println(numbers);
List<Student> roster = new ArrayList<Student>();
LocalDate dateOfBirth = LocalDate.of(1990, 7, 9);
roster.add(new Student(35140602, "Chris", "Bourke", 4.0, dateOfBirth));
roster.add(new Student(1234, "Scott", "Frost", 2.6, dateOfBirth));
roster.add(new Student(1212, "Kyle", "Schwarber", 1.2, dateOfBirth));
roster.add(new Student(44, "Anthony", "Rizzo", 4.4, dateOfBirth));
for(Student s : roster) {
System.out.println(s);
}
//TODO: sort
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment