- Processing data is a fundamental operation in Computer Science
- Two fundamental operations in processing data are searching and sorting
- Form the basis or preprocessing step of many algorithms
- Large variety of algorithms and algorithm strategies
- What are you searching? In general a "collection" (array, set, map, etc.), what are you searching for? strings, numbers, files, objects, etc.
-
How are you searching? (not an algorithm, but a criterion or criteria)
- You could search for the first, last, or all instances of a particular criteria or key elements,
$k$ - You could search for extremal elements: minimum, maximum, median, or in general the
$i$ -th order statistic
- You could search for the first, last, or all instances of a particular criteria or key elements,
- For simplicity: we'll start by searching an array of integers for a particular one,
$k$
- Given a collection (array) of integers and a key element
$k$ , we want to search through the collection and find the first element that "matches"$k$ (ie "equals"$k$ ) - To keep things abstract and general, we will develop pseudocode to solve this problem
- Pseudocode: see book
- Basic Idea: iterate through the collection until you find the element you are looking for
- Exercise: translate our pseudocode to C
/**
* This function searches the given array for
* the given key and returns the index at which
* it finds it.
*
* It returns -1 if no such element exists.
*/
int linearSearch(const int *a, int n, int key) {
for(int i=0; i<n; i++) {
if(a[i] == key) {
return i;
}
}
return -1;
}- Problem: the above function is only good for arrays of integers. What if you wanted to search arrays of:
- Strings
doubles- a string (array of characters)
- Goal (eventually): write ONE search function/algorithm that can be used on any type of variable!
- How fast is linear search?
- Can we improve it? Is there a faster search?
- Suppose that the input array is already sorted: how might you exploit this structure?
- You can exploit the sorted structure by looking at the middle element:
- Either you've found your needle
- Or you know that you needle (if it exists) is in the lower half or
- it is in the upper half
- Each iteration/comparison cuts the size of the array in half each time
- Classic "divide and conquer" style algorithmic approach which may be implemented recursively
- This turns out to be much better than linear search
- Suppose we have an array of
$n$ elements - How many comparisons does linear search perform to search for a particular key element
$k$ ?- Best case scenario: you make 1 comparison, finding
$k$ at index 0 - Worst case scenario: you make
$n$ comparisons and find it at the end (index$n-1$ ) or your search is unsuccessful - Average case scenario: beyond this course, but it leads to the same answer as the "naive" approach:
$$\frac{n+1}{2} \approx \frac{n}{2}$$ - Worst versus average case:
$n$ vs$\frac{n}{2}$ , average case is "better", but in the grand scheme of things, both of them are linear, both have the "same" rate of growth
- Best case scenario: you make 1 comparison, finding
- Binary Search:
- How many comparisons in the worst case scenario do you make?
- Derivation: whiteboard or book
- It only takes
$\log_2{(n)}$ comparisons
- How many comparisons in the worst case scenario do you make?
-
Suppose you have a database of
$10^{12}$ (1 trillion elements)- If unsorted, you have to search the database using linear search
- You end up having to make (on average)
$$\frac{n}{2} = \frac{10^{12}}{2}$$ - ie 500 billion comparisons
- Suppose the database is "sorted" (indexed): you can then use binary search to search for a record:
$$\log_2{(10^{12})} = 12 \cdot \log_2{(10)} < 40$$ - Binary search: less than 40 comparisons
-
Another perspective: suppose that we "double" the size of the array or data being searched
-
How much worse or slower does each algorithm become?
- Linear search:
$n \rightarrow 2n$ it doubles the number of comparisons needed to search - Binary search:
$n \rightarrow 2n$ you only need 1 additional comparison to search!
- Linear search:
-
Take away: binary search is exponentially faster!
- Basic Idea: you search through the array and find the minimal element, then swap it with the first element
- Repeat, you find the next smallest element and swap it with the second element
- In general on the
$i$ -th iteration, the first$i-1$ elements have been sorted: search elements$a_{i}$ thru$a_{n}$ and find the smallest element, swap it with element$a_{i}$
- How many comparisons does selection sort make on an array of size
$n$ ? - Whiteboard for analysis: setup a summation of how many comparisons the algorithm makes
$\approx n^2$ comparisons - Selection sort is a (slow) quadratic sorting algorithm
- Not a reasonable algorithm for even moderately large data sets
- Basic Idea:
- Choose a pivot element in the array
- Partition all elements around the pivot: place smaller elements to the left, larger elements to the right
- Repeat by recursing on the left partition and on the right partition until...
- you have either an empty array or an array of size 1
- Examples and code can be found in the book
- Analysis:
- Worst case:
$\approx n^2$ comparisons (if it were already sorted or you unluckily chose an extremal element as the pivot each time) - it essentially degrades into selection sort - Base/Average:
$\approx n\log_2{(n)}$ (quasilinear performance)
- Worst case:
-
Selection sort:
$n^2$ vs Quick Sort$n\log{(n)}$ -
$n\log{n}$ grows much smaller than$n^2$ -
Consider doubling the input size:
- Selection sort:
$n \rightarrow 2n$ :$$n^2 \rightarrow 4n^2$$ - Doubling the input size makes it FOUR times slower!
- Quick sort:
$n \rightarrow 2n$ :$$n\log{n} \rightarrow \approx 2n\log{n}$$ - Only doubles the number of comparisons
- Selection sort:
-
Consider sorting large arrays
$10^{12}$ -
$10^{12}\times 10^{12}$ (1 septillion) operations for selection sort -
$10^{12} \log{(10^{12})} < 40 * 10^{12}$ for quicksort
-
- In practice, you don't roll your own searching/sorting algorithms/implementations unless you have a Very Good Reason to do so
- In practice you use built-in searching & sorting functionality
- But: you want to avoid multiple solutions, you want write generic code
- Want a single function that is able to sort any and all types of data
- A sorting algorithm only needs to know how to "sort", not how to "order"
- Intead, a comparator is provided to the sorting function that is responsible for ordering
- A comparator has the following general contract:
- A comparator takes two elements
$a, b$ as input and returns- something negative if
$a < b$ - zero if
$a = b$ - something positive if
$a > b$
- something negative if
- In C, a comparator function has the following signature:
int cmp(const void *a, const void *b)
- A
void *is a generic void pointer that can point to any type of data - recall that
mallocreturns avoid *(generic void pointer) - How to implement a comparator: standard pattern
- Cast the pointers to a specific type of data you want to compare
- Use the state of the object or structure to determine the proper ordering
- You return a valid integer value that expresses that order
- Best Practices:
- Use descriptive names for your functions
- Be explicit in your comparisons
- Avoid "tricks"
- Reuse comparator functionality when appropriate!
-
Once you have a comparator, how do you use it?
-
The standard C library provides a function:
void qsort(void *base,
size_t nel,
size_t size,
int (*compar)(const void *, const void *));baseis the array of elements to be sortednelis the number of elements in the arraysizeis the number of bytes each element in the array takes, ie you usesizeof()compar- ?
- How can we "pass" a function to another function as an argument
- Generally these are referred to as "callbacks"
- Ex: GUIs: you can define a button, but how do you specify what happens when it is clicked: register a callback
- In C, pointers can point to data stored in memory
- What else is stored in memory? Your code!
- It makes sense that you can have a pointer point to a function
- Sytnax:
//create a pointer that can point to a function that
//returns an integer and takes 3 inputs:
//int, double, char
int (*ptrToFunc)(int, double, char) = NULL;
//declare a function pointer that can point to sqrt
double (*ptrToSqrt)(double) = NULL;
//make ptrToSqrt "point" to sqrt:
ptrToSqrt = sqrt;
double y = ptrToSqrt(2.0);//1.41..
//likely the compiler will not let you, but you could try:
sqrt = sin;
ptrToSqrt = sin;
double y = ptrToSqrt(2.0);//.909 (sin(2))- Java does not have pointers, so it does not have function pointers!
- Instead, you define
Comparator<>objects that have the same basic contract: - Each compatator has a single method:
compare()which returns an integer and takes 2 elements of the type it is parameterized for
