Skip to content

Instantly share code, notes, and snippets.

@cbourke
Created November 13, 2018 15:55
Show Gist options
  • Select an option

  • Save cbourke/09165e48926f0526b992cc6bfc741eee to your computer and use it in GitHub Desktop.

Select an option

Save cbourke/09165e48926f0526b992cc6bfc741eee to your computer and use it in GitHub Desktop.
auto-posted gist

Searching & Sorting

  • Processing data is a fundamental operation in Computer Science
  • Two fundamental operations in processing data are searching and sorting
  • Form the basis or preprocessing step of many algorithms
  • Large variety of algorithms have been developed

Searching

  • What are you searching? strings? numbers? files? objects?
  • How are you searching? (Not algorithmically, but by what criteria or criterion)
    • You could be searching for the first or the last or all instances of a particular criterion
    • You could search for extremal elements: minimum, maximum, median, i-th order statistic
    • There are many basic and advanced criteria that you can search on
  • We want to develop a general, abstract search algorithm
  • Pseudocode: a general, generic "code" language that allows you to specify a general abstract algorithm without getting to the specifics of any particular programming language
  • Linear search pseudocode: see book
  • Let's translate the linear search algorithm into actual C code
/**
 * This function searches the given array for
 * the given key value and returns the first
 * index at which it finds a matching element.
 *
 * This function returns -1 if no matching element
 * is found
 */
int linearSearch(const int *arr, int n, int key) {
  for(int i=0; i<n; i++) {
    if(a[i] == key) {
      return i;
    }
  }
  return -1;
}

//alternatively:
double * linearSearch(const double *arr, int n, double key) {
  for(int i=0; i<n; i++) {
    if(a[i] == key) {
      return &a[i];
    }
  }
  return NULL;
}
  • Problem: the above functions are only good for: arrays of integers.
  • If we wanted a version for arrays of double or char * (strings) or Student structures, etc.
  • Goal (eventually): design/implement ONE search/sort function to search/sort any type of variable

Binary Search

  • Suppose that the input array (collection) were sorted: how can you exploit this structure?
  • You can exploit this structure by checking the middle element $m$ (searching for $k$):
    • if equal $m = k$, you stop because you've found your needle
    • if $k &lt; m$: you "recursively" search on the left hand side
    • if $m &lt; k$: you recursively search on the right hand side
    • You stop when the size of the array is zero
int binarySearch(const int *a, int i, int j, int key) {

  if(i > j) {
     //no such element exists, the array is of "size" zero
     return -1;
  } else {
    int m = (i + j) / 2; //m is the middle index
    if(a[m] == key) {
      return m;
    } else if(key < a[m]) {
      //recurse on the left hand side
      binarySearch(a, i, m-1, key);
    } else if(a[m] < key) {
      //recurse on the right hand side
      binarySearch(a, m+1, j, key);
    }
  }

}

Analysis

  • Suppose you have an array of $n$ elements
  • How many comparisons does linear search perform to search the given array?
    • Best case scenario: you find the key at the first element: one comparison!
    • Worst case scenario: you find the key at the end of the array (or you don't find it at all): $n$ comparisons
    • Average case scenario: naive analysis: $$\frac{n + 1}{2} \approx \frac{n}{2}$$
    • Both the average and worst cases are linear
  • Binary Search?
    • Derivation: whitepaper and in the book
    • Only takes about $\log_2{(n)}$ comparisons

Perspective

  • Suppose you have a database of $10^{12}$ (1 trillion) elements

    • If unsorted you would have to use linear search to find a particular element: $$500 \times 10^{11}$$ 500 billion comparisons
    • If sorted (indexed) then you can exploit the sort using binary search: $$\log_2{(10^{12})} &lt; 40$$
  • Another perspective: suppose you "double" the input size of an array $$n \rightarrow 2n$$

    • linear search: the number of comparisons goes from $$\frac{n}{2} \rightarrow n$$ The number of comparisons doubles
    • binary search: $$\log_2{(n)} \rightarrow \log_2{(2n)} = \log_2{(n)} + \log_2{(2)} = \log_2{(n)} + 1$$

Sorting

Searching & Sorting in Practice

  • In practice: you don't generally roll your own searching and sorting algorithms unless you have a Very Good Reason to do so
  • In practice you use built-in search/sort algorithms/functions in a standard library
  • To avoid multiple solutions you use generic searching/sorting functions
  • Problem: a sorting algorithm/function may know how to "sort" but if it is generic, it may not know how to "order" sorting
  • You can use a comparator function that specifies how to "order" two elements, $a, b$
  • Given two elements $a, b$ which goes first? Are $a, b$ in order or out of order? Should they be swapped?
  • A comparator takes two elements $a, b$ as input and returns
    • something negative if $a &lt; b$
    • zero if $a = b$
    • something positive if $a &gt; b$

Comparator functions in C

  • In C, a comparator function has the following signature:

int cmp(const void *a, const void *b)

  • void * is a "generic void pointer": it can point to anything!
  • the compartor returns an integer according to the "contract" outlined above
  • Standard pattern:
    • cast the pointers to the specific type of data you want to compare
    • Use the state of the "object" or structure to determine the proper ordering
    • Return a valid integer value that expresses the order of the two elements
  • Best practices:
    • Use descriptive names for your functions
    • Be explicit in your comparisons
    • Avoid "tricks" (difference tricks)
    • Reuse comaprator functionality whenever possible
  • Examples:
    • Write a comparator to order integers in non-decreasing order
    • Write a comparator to order integers in non-increasing order
    • Write a comparator to order Student structures by NUID
    • Write a comparator to order Student structures by GPA
    • Write a comparator to order Student structures by last name/first name

Searching & Sorting

  • The standard C library provides a function:
void qsort(void *base,
           size_t nel,
           size_t size,
           int (*compar)(const void *, const void *));
  • base is the array of elements to be sorted (also note: it is not const)
  • nel is the number of elements in the array
  • size is the number of bytes each element in the array takes, usually you use: sizeof for this parameter
  • compar - a function pointer to a comparator function that defines the order that you want to sort in

Function Pointers

  • How can we "pass" a function to another function as an argument?
  • Generally these are referred to as "callbacks"
  • Ex: GUIs: you can define a button but how do you specify the code that runs when someone presses the button: you provide a callback
  • Ex: qsort needs a way to order elements, so it needs a comparator function
  • In C pointers can point to data stored in memory
  • But, a pointer is just a memory location
  • What else is stored in memory? Your code!
  • It makes sense that you can create a pointer that "points" to your code; points to a function in your code
//create a function pointer called ptrToFunc that can
//point to any function that returns
//an integer and takes three arguments:
//(int, double, char)
int (*ptrToFunc)(int, double, char) = NULL;

//declare a pointer that can point to math's sqrt function:
double (*ptrToSqrt)(double) = NULL;

//make our new pointer point to math's sqrt function:
ptrToSqrt = sqrt;

//cool: you can also invoke the function via its pointer:
double y = ptrToSqrt(2.0);

//a whole world of hurt:
sqrt = sin;

ptrToSqrt = sin;

Searching in practice

  • Searching: search.h contains linear search functions, lfind, lsearch
  • Standard library:
void * bsearch(const void *key,
               const void *base,
               size_t nel,
               size_t size,
               int (*compar) (const void *, const void *));

Sample Code

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>

typedef struct {
  int nuid;
  char *firstName;
  char *lastName;
  double gpa;
  //forget the date for now
} Student;


int cmpInt(const void *a, const void *b) {
  const int *x = (const int *)a;
  const int *y = (const int *)b;
  if(*x < *y) {
    //avoid the difference trick: return *x - *y
    //suceptible to overflow
    return -1;
  } else if(*x > *y) {
    return 1;
  } else {
    return 0;
  }
  return 0;
}



int cmpIntDesc(const void *a, const void *b) {
  return cmpInt(b, a);
}

int cmpDouble(const void *a, const void *b) {
  const double *x = (const double *)a;
  const double *y = (const double *)b;
  if(*x < *y) {
    //avoid the difference trick: return *x - *y
    //.5 - .75 = -.25 -> 0
    return -1;
  } else if(*x > *y) {
    return 1;
  } else {
    return 0;
  }
  return 0;
}

int cmpStudentByNuid(const void *a, const void *b) {
  const Student *x = (const Student *) a;
  const Student *y = (const Student *) b;
  return cmpInt( &(x->nuid), &(y->nuid) );
}

int cmpStudentByGpa(const void *a, const void *b) {
  const Student *x = (const Student *) a;
  const Student *y = (const Student *) b;
  return cmpDouble( &(y->gpa), &(x->gpa) );
}

int cmpStudentByName(const void *a, const void *b) {
  const Student *x = (const Student *) a;
  const Student *y = (const Student *) b;

  int result = strcmp(x->lastName, y->lastName);
  if(result != 0) {
    return result;
  } else if(result == 0) {
    //tie: same last name!
    return strcmp(x->firstName, y->firstName);
    //there may still be a tie, but that's okay
  }

}

int main(void) {

  int n = 10;
  int arr[] = {6, 3, 9, 4, 1, 8, 3, 12, 2, 21};
  for(int i=0; i<n; i++) {
    printf("%d ", arr[i]);
  }
  printf("\n");
  qsort(arr, n, sizeof(int), cmpIntDesc);
  for(int i=0; i<n; i++) {
    printf("%d ", arr[i]);
  }
  printf("\n");

  int k = 8;
  int *ptr = (int *) bsearch(&k, arr, n, sizeof(int), cmpDouble);
  if(ptr != NULL) {
    printf("%d found at memory location %p, which contains %d\n", k, ptr, *ptr);
  } else {
    printf("not found!\n");
  }
  return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment