1275 lines
35 KiB
Org Mode
1275 lines
35 KiB
Org Mode
#+AUTHOR: Joseph Ferano
|
|
#+STARTUP: overview
|
|
#+OPTIONS: ^:{}
|
|
|
|
* Chapter 1 - Introduction
|
|
|
|
** 1.1 Robots
|
|
|
|
An /algorithm/ is a procedure that takes any of the possible input instances
|
|
and transforms it to the desired output.
|
|
|
|
The Robot arm problem is presented where it is trying to solder contact points
|
|
and visit all points in the shortest path possible.
|
|
|
|
The first algorithm considered is ~NearestNeighbor~. However this is a naïve, and
|
|
the arm hopscotches around
|
|
|
|
Next we consider ~ClosestPair~, but that too misses in certain instances.
|
|
|
|
Next is ~OptimalTSP~, which will always give the correct result because it
|
|
enumerates all possible combinations and return the one with the shortest
|
|
path. For 20 points however, the algorithm grows at a right of O(n!). TSP stands
|
|
for Traveling Salesman Problem.
|
|
|
|
*** TODO Implement NearestNeighbor
|
|
*** TODO Implement ClosestPair
|
|
*** TODO Implement OptimalTSP for N < 8
|
|
|
|
** 1.2 Right Jobs
|
|
Here we are introduced to the Movie Scheduling Problem where we try to pick the
|
|
largest amount of mutually non-overlapping movies an actor can pick to maximize
|
|
their time. Algorithms considered are ~EarliestJobFirst~ to start as soon as
|
|
possible, and then ~ShortestJobFirst~ to be done with it the quickest, but both
|
|
fail to find optimal solutions.
|
|
|
|
~ExhaustiveScheduling~ grows at a rate of O(2^{n}) which is much better than O(n!) as
|
|
in the previous problem. Finally ~OptimalScheduling~ improves efficiency by first
|
|
removing candidates that are overlapping such that it doesn't even compare them.
|
|
|
|
** 1.3 Correctness
|
|
|
|
It's important to be clear about the steps in pseudocode when designing
|
|
algorithms on paper. There are important things to consider about algorithm
|
|
correctness;
|
|
|
|
- Verifiability
|
|
- Simplicity
|
|
- Think small
|
|
- Think exhaustively
|
|
- Hunt for the weakness
|
|
- Go for a tie
|
|
- Seek extremes
|
|
|
|
Other tecniques include *Induction*, *Recursion*, and *Summations*.
|
|
|
|
** 1.4 Modeling
|
|
|
|
Most algorithms are designed to work on rigorously defined abstract
|
|
structures. These fundamental structures include;
|
|
|
|
- Permutations
|
|
- Subsets
|
|
- Trees
|
|
- Graphs
|
|
- Points
|
|
- Polygons
|
|
- Strings
|
|
|
|
** 1.5-1.6 War Story about Psychics
|
|
|
|
|
|
* Chapter 2 - Algorithm Analyses
|
|
|
|
** 2.1 RAM Model of Computation
|
|
|
|
This is a simpler kind of Big Oh where
|
|
|
|
- Each simple operation is 1 step
|
|
- Loops and Subroutines are composition of simple operations
|
|
- Each memory access is one time step
|
|
|
|
Like flat earth theory, in practice we use it when engineering certain
|
|
structures because we don't take into account the curvature of the Earth.
|
|
|
|
We can already apply the concept of worst, average, and best case to this model.
|
|
|
|
** 2.2 Big Oh
|
|
|
|
The previous model often requires concrete implementations to actually measure
|
|
correctly, so instead Big Oh gives us a better, simpler framework for discussing
|
|
the relative performance between algorithms. It ignores factors that don't
|
|
impact how algorithms scale.
|
|
|
|
** 2.3 Growth Rates and Dominance Relations
|
|
|
|
These are the functions that occur in algorithm analyses;
|
|
|
|
- *Constant O(1)*
|
|
Hashtable look up, array look up, consing a list
|
|
- *Logarithmic O(log n)*
|
|
Binary Search
|
|
- *Linear O(n)*
|
|
Iterating over a list
|
|
- *Superlinear O(n log n)*
|
|
Quicksort and Mergesort
|
|
- *Quadratic* O(n^{2})
|
|
Insertion Sort and Selection Sort
|
|
- *Cubic* O(n^{3})
|
|
Some dynamic programming problems
|
|
- *Exponential* O(C^{n}^{}) *c for any constant c > 1*
|
|
Enumerate all subsets
|
|
- *Factorial O(n!)*
|
|
Generating all permutations or orderings
|
|
|
|
*Notes*:
|
|
- O(n!) algorithms become useless for anything n >= 20
|
|
- O(2^{n}) algorithms become impractical for anything n > 40
|
|
- O(n^{2}^{}) algorithms start deteriorating after n > 10,000, a million is hopeless
|
|
- O(n^{2}^{}) and O(n log n) Are fine up to 1 billion
|
|
|
|
** 2.4 Working with Big Oh
|
|
|
|
Apparently you can do arithmetic on the Big Oh functions
|
|
|
|
** 2.5 Efficiency
|
|
|
|
*** Selection Sort
|
|
|
|
**** OCaml
|
|
|
|
Really reinventing the wheel on this one...
|
|
|
|
#+begin_src ocaml
|
|
let max_element = function
|
|
| [] -> invalid_arg "empty list"
|
|
| x::xs ->
|
|
let rec f acc = function
|
|
| [] -> acc
|
|
| x::xs -> f (if x > acc then x else acc) xs
|
|
in f x xs
|
|
|
|
let remove item list =
|
|
let rec f acc item = function
|
|
| [] -> List.rev acc
|
|
| x::xs -> if item = x then (List.rev acc) @ xs else f (x::acc) item xs
|
|
in f [] item list
|
|
|
|
let selection_sort list =
|
|
let rec f acc = function
|
|
| [] -> acc
|
|
| xs ->
|
|
let m = max xs
|
|
in f (m::acc) (remove m xs)
|
|
in f [] list
|
|
#+end_src
|
|
|
|
**** Python
|
|
|
|
#+begin_src python
|
|
def selection_sort(arr):
|
|
sorted_list = []
|
|
for i in range(len(arr)):
|
|
max = arr[0]
|
|
for count, value in enumerate(arr):
|
|
if value > max:
|
|
max = value
|
|
sorted_list.append(max)
|
|
arr.remove(max)
|
|
return sorted_list
|
|
|
|
selection_sort([2,1,5,3,4])
|
|
#+end_src
|
|
|
|
**** C
|
|
|
|
#+begin_src C :includes stdio.h
|
|
void print_nums(int *nums, int length) {
|
|
for (int i = 0; i < length; i++) {
|
|
printf("%d,", nums[i]);
|
|
}
|
|
printf("\n");
|
|
}
|
|
|
|
void selection_sort(int *nums, int length) {
|
|
int i, j;
|
|
int min_idx;
|
|
for (i = 0; i < length; i++) {
|
|
print_nums(nums, length);
|
|
min_idx = i;
|
|
for (j = i+1; j < length; j++) {
|
|
if (nums[j] < nums[min_idx]) {
|
|
min_idx = j;
|
|
}
|
|
}
|
|
int temp = nums[min_idx];
|
|
nums[min_idx] = nums[i];
|
|
nums[i] = temp;
|
|
}
|
|
}
|
|
|
|
int nums[9] = { 2, 4, 9, 1, 3, 8, 5, 7, 6 };
|
|
selection_sort(nums, 9);
|
|
#+end_src
|
|
|
|
|
|
*** Insertion Sort
|
|
**** C
|
|
|
|
#+begin_src C :includes stdio.h
|
|
void insertion_sort(int *nums, int len) {
|
|
int i, j;
|
|
for (i = 1; i < len; i++) {
|
|
j = i;
|
|
while (nums[j] < nums[j -1] && j > 0) {
|
|
int temp = nums[j];
|
|
nums[j] = nums[j - 1];
|
|
nums[j - 1] = temp;
|
|
j--;
|
|
}
|
|
}
|
|
}
|
|
|
|
int nums[8] = {1,4,5,2,8,3,7,9};
|
|
insertion_sort(nums, 8);
|
|
for (int i = 0; i < 8; i++) {
|
|
printf("%d", nums[i]);
|
|
}
|
|
#+end_src
|
|
|
|
#+RESULTS:
|
|
: 12345789
|
|
|
|
|
|
*** TODO String Pattern Matching
|
|
*** TODO Matrix Multiplication
|
|
|
|
** 2.6 Logarithms
|
|
|
|
Logarithms are the inverse of exponents. Binary search is great for sorted
|
|
lists. There are applications related to fast exponentiation, binary trees,
|
|
harmonic numbers, and criminal sentencing.
|
|
|
|
** 2.7 Properties of Logarithms
|
|
|
|
Common bases for logarithms include 2, /e/, and 10. The base of the logarithm has
|
|
no real impact on the growth rate; log_{2} and log_{3} are roughly equivalent.
|
|
|
|
** 2.8 War Story Pyramids
|
|
|
|
Cool story bro
|
|
|
|
|
|
** 2.9 Advanced Analysis
|
|
|
|
Some advanced stuff
|
|
- *Inverse Ackerman's Function*
|
|
Union-Find data structure
|
|
- *log log n*
|
|
Binary search on a sorted array of only log n items
|
|
- *log n / log log n*
|
|
- log^{2} n
|
|
- sqrt(n)
|
|
|
|
There are also limits and dominance relations
|
|
|
|
* Chapter 3 - Data Structures
|
|
|
|
** 3.1 Contiguous vs Linked Data Structures
|
|
|
|
Advantages of Arrays
|
|
- Constant-time access given the index
|
|
- Space efficiency
|
|
- Memory locality
|
|
|
|
Downsides is that they don't grow but dynamic arrays fix this by allocating a
|
|
new bigger array when needed.
|
|
|
|
Advantages of Linked Structures
|
|
- No overflow, can keep growing
|
|
- Insertions/Deletions are simpler
|
|
- A collection of pointers are lighter than contiguous data
|
|
|
|
However, pointers require extra space for storing pointer fields
|
|
|
|
** 3.2 Stacks and Queues
|
|
|
|
*** Stacks
|
|
/(PUSH, /POP/) LIFO, useful in executing recursive algorithms.
|
|
|
|
#+begin_src C :includes stdlib.h stdio.h stdbool.h
|
|
typedef struct Node {
|
|
struct Node* next;
|
|
/* void *data; */
|
|
int data;
|
|
} Node;
|
|
|
|
typedef struct Stack {
|
|
Node* top;
|
|
int length;
|
|
} Stack;
|
|
|
|
/* void stack_create(void* data) { */
|
|
Stack *stack_create(int data) {
|
|
Stack *stack = malloc(sizeof(Stack));
|
|
Node *node = malloc(sizeof(Node));
|
|
node->data = data;
|
|
node->next = NULL;
|
|
stack->top = node;
|
|
stack->length = 1;
|
|
return stack;
|
|
}
|
|
|
|
int stack_pop(Stack *stack) {
|
|
Node *top = stack->top;
|
|
Node *next = top->next;
|
|
if (stack->length > 0 && next != NULL) {
|
|
stack->top = next;
|
|
stack->length--;
|
|
int value = top->data;
|
|
free(top);
|
|
return value;
|
|
} else {
|
|
// A better API design would be to return a bool and the int as a pointer
|
|
// Although once we switch away from int and use void pointers, might not be needed
|
|
return -1;
|
|
}
|
|
}
|
|
|
|
/* void stack_push(Stack *stack, void *data) { */
|
|
void stack_push(Stack *stack, int data) {
|
|
Node *node = malloc(sizeof(Node));
|
|
Node *top = stack->top;
|
|
node->data = data;
|
|
node->next = top;
|
|
stack->top = node;
|
|
stack->length++;
|
|
}
|
|
|
|
/* void* stack_peak(Stack *stack) { */
|
|
int stack_peak(Stack *stack) {
|
|
return stack->top->data;
|
|
}
|
|
|
|
void stack_print(Stack *stack) {
|
|
Node *current = stack->top;
|
|
int i = 0;
|
|
while (current != NULL) {
|
|
printf("Stack at %d: %d\n", ++i, current->data);
|
|
current = current->next;
|
|
}
|
|
printf("------------------\n");
|
|
}
|
|
#+end_src
|
|
|
|
*** Queues
|
|
(/ENQUEUE/, /DEQUEUE/) FIFO, useful for breadth-first searches in graphs.
|
|
|
|
#+name: queue
|
|
#+begin_src C :includes stdio.h stdlib.h
|
|
#define QBUFSIZE 64
|
|
#define T int
|
|
|
|
struct queue {
|
|
T buf[QBUFSIZE];
|
|
int start;
|
|
int end;
|
|
int size;
|
|
};
|
|
|
|
struct queue *q_create() {
|
|
struct queue *q = calloc(1, sizeof(struct queue));
|
|
q->start = 0;
|
|
q->end = 0;
|
|
}
|
|
|
|
void q_enqueue(struct queue *q, T item) {
|
|
if (q->size == QBUFSIZE) {
|
|
printf("Queue Overflow");
|
|
exit(1);
|
|
}
|
|
q->buf[q->end] = item;
|
|
q->end = ++q->end % QBUFSIZE;
|
|
q->size++;
|
|
}
|
|
|
|
T q_dequeue(struct queue *q) {
|
|
if (q->size == 0) {
|
|
printf("Queue empty");
|
|
exit(1);
|
|
}
|
|
T item = q->buf[q->start++];
|
|
q->size--;
|
|
return item;
|
|
}
|
|
|
|
T q_peek(struct queue *q) {
|
|
if (q->size == 0) {
|
|
printf("Queue empty");
|
|
exit(1);
|
|
}
|
|
return q->buf[q->start];
|
|
}
|
|
|
|
bool q_empty(struct queue *q) {
|
|
return q->size <= 0;
|
|
}
|
|
|
|
void q_print(struct queue *q) {
|
|
printf("Qeueu_Elements: ");
|
|
for (int i = 0; i < q->size; i++) {
|
|
printf("%i-", q->buf[(i + q->start) % QBUFSIZE]);
|
|
}
|
|
printf("\n");
|
|
}
|
|
|
|
// struct queue *q = q_create();
|
|
// q_enqueue(q, 1);
|
|
// q_enqueue(q, 2);
|
|
// q_enqueue(q, 3);
|
|
// q_enqueue(q, 4);
|
|
// q_dequeue(q);
|
|
// q_dequeue(q);
|
|
// q_enqueue(q, 5);
|
|
// q_enqueue(q, 6);
|
|
// q_print(q);
|
|
#+end_src
|
|
|
|
|
|
** 3.3 Dictionaries
|
|
|
|
Not just hashtables but anything that can provide access to data by
|
|
content. Some dictionaries implement trees instead of hashing. Both contiguous
|
|
and linked structures can be used with tradeoffs between them.
|
|
|
|
** 3.4 Binary Search Trees
|
|
|
|
BSTs have a parent and two child nodes; left and right. They support insertion,
|
|
deletion, traversal. Interestingly, Min and Max can be calculated by seeking the
|
|
leftmost and rightmost node respectively, provided the tree is balanced. BSTs
|
|
can have good performance for most cases so long as they remain balanced. O(h)
|
|
refers to the time being the height of the BST.
|
|
|
|
** 3.5 Priority Queues
|
|
|
|
They allow new elements to enter a system at arbitrary intervals.
|
|
|
|
** 3.6 War Story
|
|
|
|
Rather than storing all of the vertices of a mesh, you can share them between
|
|
the different triangles, but connecting all vertices requires visiting each
|
|
vertice once, a Hamiltonian path, but that's NP-Complete. Using a greedy
|
|
heuristic where it tries to always grab the best possible thing first. Then
|
|
using a priority queue, they were able to reduce the running time by several
|
|
orders of magnitude compared to the naïve approach.
|
|
|
|
** 3.7 Hasing and Strings
|
|
|
|
Take a map to a big int, use modulo to spin around, and if /m/ is a large prime
|
|
you'll get fairly uniform distribution. The two main ways to solve collisions
|
|
are /Chaining/ and /Open Addressing/. Chaining is where each bucket has a linked
|
|
list and collisions are appended. Open addressing looks for adjacent empty
|
|
buckets.
|
|
|
|
Hashing is also useful when dealing strings, in particular, substring pattern
|
|
matching. Overlaying pattern /p/ over every position in text /t/ would result in
|
|
O(m*n). With hashing, you can hash the slices of /t/ and compare them to /p/, and
|
|
get slower growth. This is called the *Rabin-Karp algorithm*. While
|
|
false-positives may occur, a good hashing function would avoid this.
|
|
|
|
Hashing is so important Yahoo! employs them extensively.
|
|
|
|
** 3.8 Specialized Data Structures
|
|
|
|
These include;
|
|
|
|
- String
|
|
Characters in an array
|
|
- Geometric
|
|
Collection of points and regions/polygons
|
|
- Graph
|
|
Using adjacency matrices
|
|
- Set
|
|
Dicionaries and bit vectors
|
|
|
|
|
|
** 3.9 War Story
|
|
|
|
They were trying to implement sequencing by hybridization (SBH), but ran into
|
|
issues when they used a BST. Then they tried a hashtable, then a trie. Finally
|
|
what worked was a compressed suffix tree.
|
|
|
|
|
|
** Exercises
|
|
|
|
*** 3.42
|
|
|
|
Reverse the words in a sentence—that is, “My name is Chris” becomes “Chris is
|
|
name My.” Optimize for time and space.
|
|
|
|
#+begin_src C :includes stdlib.h stdio.h string.h
|
|
void reverse_word(char *string, int length) {
|
|
for (int i = 0; i < length / 2; i++) {
|
|
char temp = string[i];
|
|
string[i] = string[length - 1 - i];
|
|
string[length - 1 - i] = temp;
|
|
}
|
|
}
|
|
|
|
void reverse_words(char *string, int length) {
|
|
printf("Before: %s\n", string);
|
|
reverse_word(string, length);
|
|
printf("After: %s\n", string);
|
|
int start = 0;
|
|
for (int i = 0; i < length; i++) {
|
|
if (string[i] == ' ' || i == length - 1) {
|
|
if (i == length - 1) i++;
|
|
reverse_word(&string[start], i - start);
|
|
start = i + 1;
|
|
}
|
|
}
|
|
}
|
|
|
|
char str[] = "My name is Chris";
|
|
reverse_words(str, strlen(str));
|
|
printf("Final: %s\n", str);
|
|
#+end_src
|
|
|
|
#+RESULTS:
|
|
| Before: | My | name | is | Chris |
|
|
| After: | sirhC | si | eman | yM |
|
|
| Final: | Chris | is | name | My |
|
|
|
|
* Chapter 4 - Sorting and Searching
|
|
|
|
** 4.1 Applications of Sorting
|
|
|
|
Apparently sorting is a big deal. There are a lot of problems that can be solved by using a sorted
|
|
list, for example; closest pair, searching, frequency distribution, convex hulls, etc;
|
|
|
|
The fastest sorting algorithms are ~n log n~, here is how it scales compared to an algorithm with
|
|
quadratic growth. Even crazier is that this table divides it by 4 so it's not that ridiculous.
|
|
|
|
| n | n^{2}/4 | n lg n |
|
|
|---------+---------------+-----------|
|
|
| 10 | 25 | 33 |
|
|
| 100 | 2,500 | 664 |
|
|
| 1,000 | 250,000 | 9,965 |
|
|
| 10,000 | 25,000,000 | 132,877 |
|
|
| 100,000 | 2,500,000,000 | 1,660,960 |
|
|
|
|
Whenever you have an algorithmic problem, don't be afraid to use sorting, because if it results in
|
|
the ability to then do a linear scan, at worst is becomes ~2n log n~, which in the end is just ~n log n~
|
|
|
|
** 4.2 Pragmatics of Sorting
|
|
|
|
There are some important things to consider when sorting; the order, keys and their data, equality,
|
|
and non-numerical data. For resolving these, we would use /comparison functions/. Here is the
|
|
signature of the C function for quicksort
|
|
|
|
#+begin_src C :include stdlib.h
|
|
void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*))
|
|
#+end_src
|
|
|
|
It takes a comparison function that might look like this;
|
|
|
|
#+begin_src C :include stdlib.h
|
|
int compare(int *i, int *j) {
|
|
if (*i > *j) return 1;
|
|
if (*i < *j) return -1;
|
|
return 0;
|
|
}
|
|
#+end_src
|
|
|
|
|
|
** 4.3 Heapsort
|
|
|
|
An O(n^{2}) sorting algorithm like Selection Sort can be made faster by using the right data structure,
|
|
in this case either a heap or a balanced binary tree. The first initial construction will take
|
|
one O(n), but subsequent operations within the loop will now take O(log n) rather than O(n), giving
|
|
a final complexity of O(n log n).
|
|
|
|
|
|
*** Heaps
|
|
|
|
They can either be min or max heaps and the root node will dominate its children in min-ness or
|
|
max-ness. It's also different in that it can be implemented in an array and still be reasonably
|
|
conservative with it's space complexity. However it should be noted that searching isn't efficient
|
|
because the nodes aren't guaranteed to be ordered, only the relationship between the parent/child.
|
|
|
|
Here's the full heap implementation;
|
|
|
|
#+begin_src C :includes stdio.h stdlib.h
|
|
#define item_type int
|
|
|
|
struct priority_queue {
|
|
// TODO: Why did I make this a pointer?
|
|
item_type *q;
|
|
int len;
|
|
int capacity;
|
|
};
|
|
|
|
struct priority_queue *pq_create(int capacity) {
|
|
item_type *q = calloc(capacity, sizeof(item_type));
|
|
struct priority_queue *pq = malloc(sizeof(struct priority_queue));
|
|
pq->q = q;
|
|
pq->len = 0;
|
|
pq->capacity = capacity;
|
|
}
|
|
|
|
int pq_parent(int n) {
|
|
if (n == 1) return -1;
|
|
else return ((int) n/2);
|
|
}
|
|
|
|
int pq_young_child(int n) {
|
|
return 2 * n;
|
|
}
|
|
|
|
// I mean technically we shouldn't need to provide the parent, since we can
|
|
// just call that ourselves
|
|
void pq_swap(struct priority_queue *pq, int n, int parent) {
|
|
item_type item = pq->q[n];
|
|
pq->q[n] = pq->q[parent];
|
|
pq->q[parent] = item;
|
|
}
|
|
|
|
void pq_bubble_up(struct priority_queue *pq, int n) {
|
|
int pidx = pq_parent(n);
|
|
if (pidx == -1) {
|
|
return;
|
|
}
|
|
item_type parent = pq->q[pidx];
|
|
item_type node = pq->q[n];
|
|
|
|
if (parent > node) {
|
|
pq_swap(pq, n, pidx);
|
|
pq_bubble_up(pq, pidx);
|
|
}
|
|
}
|
|
|
|
void pq_bubble_down(struct priority_queue *pq, int n) {
|
|
int cidx = pq_young_child(n);
|
|
if (cidx > pq->len) {
|
|
return;
|
|
}
|
|
item_type child = pq->q[cidx];
|
|
item_type node = pq->q[n];
|
|
int min_idx = n;
|
|
|
|
if (cidx <= pq->len && node > child) {
|
|
min_idx = cidx;
|
|
}
|
|
if (cidx + 1 <= pq->len && pq->q[min_idx] > pq->q[cidx + 1]) {
|
|
min_idx = cidx + 1;
|
|
}
|
|
|
|
if (node > child) {
|
|
pq_swap(pq, n, min_idx);
|
|
pq_bubble_down(pq, min_idx);
|
|
}
|
|
}
|
|
|
|
void pq_insert(struct priority_queue *pq, item_type x) {
|
|
if (pq->len >= pq->capacity) {
|
|
printf("Error: Priority Queue Overflow");
|
|
return;
|
|
}
|
|
pq->q[++pq->len] = x;
|
|
pq_bubble_up(pq, pq->len);
|
|
}
|
|
|
|
item_type pq_pop_top(struct priority_queue *pq) {
|
|
if (pq->len == 0) {
|
|
printf("Error: No elements in Priority Qeueu");
|
|
return -1;
|
|
}
|
|
item_type top = pq->q[1];
|
|
pq->q[1] = pq->q[pq->len--];
|
|
pq_bubble_down(pq, 1);
|
|
return top;
|
|
}
|
|
|
|
struct priority_queue* pq = pq_create();
|
|
|
|
#define ELEMENTS 8
|
|
item_type arr[ELEMENTS];
|
|
|
|
for (int i = 0; i < ELEMENTS; i++) {
|
|
arr[i] = i + 1;
|
|
}
|
|
|
|
void reverse(item_type *arr) {
|
|
for (int i = 0; i < ELEMENTS / 2; i++) {
|
|
item_type temp = arr[i];
|
|
arr[i] = arr[ELEMENTS - i - 1];
|
|
arr[ELEMENTS - i - 1] = temp;
|
|
}
|
|
}
|
|
|
|
void shuffle(item_type *arr) {
|
|
for (int i = 0; i < ELEMENTS - 1; i++) {
|
|
float r = (float)rand() / RAND_MAX;
|
|
int idx = (int)(ELEMENTS * r);
|
|
item_type temp = arr[idx];
|
|
arr[idx] = arr[i];
|
|
arr[i] = temp;
|
|
}
|
|
}
|
|
|
|
void print_elems(item_type *arr) {
|
|
for (int i = 0; i < ELEMENTS; i++) {
|
|
printf("%d,", arr[i]);
|
|
if (i == ELEMENTS - 1) {
|
|
printf("\n");
|
|
}
|
|
}
|
|
}
|
|
|
|
void print_pq(struct priority_queue *pq) {
|
|
for (int i = 1; i <= pq->len; i++) {
|
|
printf("%d,", pq->q[i]);
|
|
if (i == pq->len) {
|
|
printf("\n");
|
|
}
|
|
}
|
|
}
|
|
|
|
reverse(arr);
|
|
for (int i = 0; i < ELEMENTS; i++) {
|
|
pq_insert(pq, arr[i]);
|
|
}
|
|
|
|
print_pq(pq);
|
|
|
|
pq_pop_top(pq);
|
|
|
|
print_pq(pq);
|
|
#+end_src
|
|
|
|
|
|
** 4.4 War Story
|
|
|
|
Apparently calculating the price of airline tickets is hard.
|
|
|
|
** 4.5 Mergesort
|
|
|
|
Uses Divide-and-Conquer, recursively partitioning elements into two groups. It
|
|
takes O(n log n), however the space complexity is linear, because in the
|
|
following code, we have to allocate memory to construct a new sorted array.
|
|
Doing so in place doesn't work because you're effectively destroying the
|
|
previous sorting. However, when working with linked lists, no extra allocations
|
|
are required since you can just rearrange what the pointers point to.
|
|
|
|
#+begin_src C :includes stdio.h stdlib.h
|
|
int *merge_sort(int *array, int start, int len) {
|
|
int *sorted = malloc(sizeof(int) * len);
|
|
if (len <= 1) {
|
|
sorted[0] = array[start];
|
|
return sorted;
|
|
}
|
|
int half = (len + 1) / 2;
|
|
int *sorted_l = merge_sort(array, start, half);
|
|
int *sorted_r = merge_sort(array, start + half, len / 2);
|
|
int size_r = len / 2;
|
|
int size_l = half;
|
|
int i = 0, ir = 0, il = 0;
|
|
for (; i < len; i++) {
|
|
if ((il >= size_l && ir < size_r) || (ir < size_r && sorted_l[il] > sorted_r[ir])) {
|
|
sorted[i] = sorted_r[ir++];
|
|
} else if (il < size_l) {
|
|
sorted[i] = sorted_l[il++];
|
|
}
|
|
}
|
|
free(sorted_l);
|
|
free(sorted_r);
|
|
return sorted;
|
|
}
|
|
|
|
#define AL 10
|
|
int array[AL] = { 'y','z','x','n','k','m','d','a','b','c' };
|
|
/* int array[AL] = { 9,8,6,7,4,5,2,3,1,0 }; */
|
|
/* int array[AL] = { 0,1,2,3,4,5,6,7,8,9 }; */
|
|
|
|
for (int i = 0; i < AL; i++) {
|
|
printf("%c-", array[i]);
|
|
} printf("\n");
|
|
|
|
int *sorted = merge_sort(array, 0, AL);
|
|
|
|
for (int i = 0; i < AL; i++) {
|
|
printf("%c-", sorted[i]);
|
|
}
|
|
#+end_src
|
|
|
|
#+RESULTS:
|
|
| y-z-x-n-k-m-d-a-b-c- |
|
|
| a-b-c-d-k-m-n-x-y-z- |
|
|
|
|
** 4.6 Quicksort
|
|
|
|
Quicksort depends on a randomly selected pivot in order to get O(n log n) in the
|
|
average case. This is because if you select the same index for the pivot each
|
|
time, there will always exist an arrangement of elements in an array that will
|
|
be the worst case and result in O(n^{2}).
|
|
|
|
The moral of the story is that randomization is helpful for improving
|
|
algorithms involved in sampling, hashing, and searching. The nuts and bolts
|
|
problem is a good example; if you have /n/ different sized bolts and /n/
|
|
matching nuts, how long would it take to match them all. Well if you pick a bolt
|
|
randomly and make two piles based on whether they are smaller or larger, then
|
|
you effectively ran a quicksort and were able to get it done in O(n log n) time,
|
|
rather than having to test each bolt with each nut.
|
|
|
|
One important note about Quicksort and why it's preferred over Mergesort is
|
|
because apparently in real world benchmarks, it outperforms it 2-3x. This is
|
|
likely due to the extra space requirements of mergesort. In particular if you
|
|
have to allocate memory on the heap
|
|
|
|
#+begin_src C :includes stdio.h stdlib.h time.h
|
|
void swap(int *i, int *j) {
|
|
int temp = *i;
|
|
,*i = *j;
|
|
,*j = temp;
|
|
}
|
|
|
|
void skiena_quicksort(int A[], int p, int q) {
|
|
if(p >= q) return;
|
|
|
|
swap(&A[p + (rand() % (q - p + 1))], &A[q]); // PIVOT = A[q]
|
|
int i = p - 1;
|
|
for(int j = p; j <= q; j++) {
|
|
if(A[j] <= A[q]) {
|
|
swap(&A[++i], &A[j]);
|
|
}
|
|
}
|
|
|
|
skiena_quicksort(A, p, i - 1);
|
|
skiena_quicksort(A, i + 1, q);
|
|
}
|
|
|
|
void quicksort(int *array, int start, int end) {
|
|
int len = end - start + 1;
|
|
if (len <= 1) {
|
|
return;
|
|
}
|
|
int r = rand() % len;
|
|
int pivot = array[start + r];
|
|
int mid = start;
|
|
int s = mid;
|
|
int h = end;
|
|
while (mid <= h) {
|
|
if (array[mid] < pivot) {
|
|
swap(&array[mid++], &array[s++]);
|
|
} else if (array[mid] > pivot) {
|
|
swap(&array[mid], &array[h--]);
|
|
} else {
|
|
mid++;
|
|
quicksort(array, start, s-1);
|
|
quicksort(array, s+1, end);
|
|
return;
|
|
}
|
|
}
|
|
quicksort(array, start, s-1);
|
|
quicksort(array, s+1, end);
|
|
}
|
|
|
|
#define LEN 10
|
|
/* int array[LEN] = {5,4,3,2,1}; */
|
|
int array[LEN] = {5,4,3,2,3,2,1,1,8,6};
|
|
|
|
for (int i = 0; i < LEN; i++) {
|
|
printf("%i", array[i]);
|
|
} printf("\n");
|
|
|
|
/* skiena_quicksort(array, 0, LEN - 1); */
|
|
quicksort(array, 0, LEN - 1);
|
|
|
|
for (int i = 0; i < LEN; i++) {
|
|
printf("%i", array[i]);
|
|
}
|
|
#+end_src
|
|
|
|
Usually in higher level languages like Python and Haskell, you would allocate
|
|
new arrays to hold the new sorted elements. However, it becomes more challenging
|
|
when doing it in place. This algorithm requires 3 pointers to keep track of the
|
|
mid point, the iterator, and the high, then finish once mid passes h.
|
|
|
|
*** Python
|
|
|
|
#+begin_src python
|
|
import random
|
|
|
|
def quicksort(arr):
|
|
if len(arr) < 2:
|
|
return arr
|
|
elif len(arr) == 2:
|
|
if arr[0] > arr[1]:
|
|
temp = arr[1]
|
|
arr[1] = arr[0]
|
|
arr[0] = temp
|
|
return arr
|
|
else:
|
|
# Pick a random pivot
|
|
index = random.randrange(0, len(arr))
|
|
pivot = arr.pop(index)
|
|
left = [x for x in arr if x <= pivot]
|
|
right = [x for x in arr if x > pivot]
|
|
return quicksort(left) + [pivot] + quicksort(right)
|
|
#+end_src
|
|
|
|
** 4.7 Distribution Sort: Bucketing
|
|
|
|
Two other sorting algorithms function similarly by subdividing the sorting
|
|
groups; bucketsort and distribution sort. The example given is that of a
|
|
phonebook. However, these are more heuristic and don't guarantee good
|
|
performance if the distribution of the data is not fairly uniform.
|
|
|
|
** 4.8 War Story
|
|
|
|
Apparently serving as an expert witness is quite interesting. That and hardware
|
|
is the platform.
|
|
|
|
** 4.9 Binary Search and Related Algorithms
|
|
|
|
If anyone ever asks you to play 20 questions where you have to guess a word,
|
|
just use binary search and before 20 tries you will have narroed down the
|
|
correct word. Counting occurences can have pretty good time if you want to have
|
|
constant space growth; just sort the array then count the repeat ocurrences in a
|
|
contiguous sequence. There's also a one-sided binary search in case you're
|
|
dealing with lazy sequences, where you search first by A[1], then A[2], A[4],
|
|
A[8], A[16], and so forth. You can also use these sorts of bisections on square
|
|
root problems, whatever those are.
|
|
|
|
Here's a simple binary search guessing game;
|
|
|
|
#+begin_src ocaml
|
|
let binary_search items target =
|
|
let rec f low high =
|
|
match (high - low) / 2 + low with
|
|
| mid when target = items.(mid) -> Some items.(mid)
|
|
| mid when target < items.(mid) -> f low mid
|
|
| mid when target > items.(mid) -> f mid high
|
|
| _ -> None
|
|
in f 0 (Array.length items);;
|
|
|
|
binary_search [|1;2;3;4;5|] 3;;
|
|
#+end_src
|
|
|
|
** 4.10 Divide-and-Conquer
|
|
|
|
Algorithms that use this technique, Mergesort being the classic example, have
|
|
other important applications such as Fourier transforms and Strassen's matrix
|
|
multiplication algorithm. What's important is understanding recurrence
|
|
relations.
|
|
|
|
*** Recurrence Relations
|
|
|
|
It is an equation that is defined in terms of itself, so I guess recursive.
|
|
Fibonacci is a good example F_{n} = F_{n-1} + F_{n-2}...
|
|
|
|
* Chapter 5 - Graph Traversal
|
|
|
|
** 5.1 Graphs
|
|
|
|
Graphs are G = (E,V), so a set of edges and a set of vertices. Many real world
|
|
systems can be modeled as graphs, such as road/city networks. Many algorithmic
|
|
problems become much simpler when modeled with graphs. There are several flavors
|
|
of graphs;
|
|
|
|
- Directed vs Undirected
|
|
- Weighted vs Unweighted
|
|
- Simple vs Non-simple
|
|
- Spares vs Dense
|
|
- Cyclic vs Acyclic
|
|
- Embedded vs Topological
|
|
- Implicted vs Explicit
|
|
- Labeled vs Unlabeled
|
|
|
|
Social networks provide an interesting way to analyze each one of these
|
|
considerations.
|
|
|
|
** 5.2 Data Structures for Graphs
|
|
|
|
Which data structure we use will impact the time and space complexity of certain
|
|
operations.
|
|
|
|
- Adjecency Matrix
|
|
You can use an /n/ x /m/ matrix, where each /(i,j)/ index answers whether an edge
|
|
exists. However, with sparse graphs, there might be a lot of wasted space.
|
|
|
|
- Adjencency Lists
|
|
Use linked lists instead, however it is harder to verify if a certain edge
|
|
exists. This can be mitigated by collecting them in a BFS of DFS.
|
|
|
|
#+name: graph
|
|
#+begin_src C :includes stdio.h stdlib.h stdbool.h
|
|
#define MAXV 1000
|
|
|
|
struct edgenode {
|
|
int y;
|
|
int weight;
|
|
struct edgenode *next;
|
|
};
|
|
|
|
struct graph {
|
|
struct edgenode *edges[MAXV];
|
|
int degree[MAXV];
|
|
int nvertices;
|
|
int nedges;
|
|
bool directed;
|
|
};
|
|
|
|
void initialize_graph(struct graph *g, bool directed) {
|
|
g->nvertices = 0;
|
|
g->nedges = 0;
|
|
g->directed = directed;
|
|
for (int i = 0; i < MAXV; i++) g->degree[i] = 0;
|
|
for (int i = 0; i < MAXV; i++) g->edges[i] = NULL;
|
|
}
|
|
|
|
// TODO: We have to read whether the graph is directed or not and
|
|
// insert edges twice
|
|
void insert_edge(struct graph *g, int x, int y) {
|
|
struct edgenode *p;
|
|
p = malloc(sizeof(struct edgenode));
|
|
p->weight = 0;
|
|
p->y = y;
|
|
p->next = g->edges[x];
|
|
|
|
g->edges[x] = p;
|
|
g->degree[x]++;
|
|
|
|
g->nedges++;
|
|
}
|
|
|
|
void print_graph(struct graph *g) {
|
|
int i;
|
|
struct edgenode *p;
|
|
|
|
for (i = 0; i < g->nvertices; i++) {
|
|
printf("V%d", i+1);
|
|
p = g->edges[i];
|
|
while (p != NULL) {
|
|
printf("->%d", p->y);
|
|
p = p->next;
|
|
}
|
|
printf("\n");
|
|
}
|
|
}
|
|
|
|
struct graph *g = malloc(sizeof(struct graph));
|
|
initialize_graph(g, true);
|
|
g->nvertices = 3;
|
|
insert_edge(g, 0, 1);
|
|
insert_edge(g, 0, 2);
|
|
insert_edge(g, 0, 3);
|
|
insert_edge(g, 1, 1);
|
|
insert_edge(g, 1, 3);
|
|
insert_edge(g, 2, 1);
|
|
|
|
print_graph(g);
|
|
#+end_src
|
|
|
|
|
|
** 5.3 War Story
|
|
|
|
Apparently, computers were really slow before. That and it's better to keep
|
|
asymptotics in mind even if it's about the same.
|
|
|
|
** 5.4 War Story
|
|
|
|
Apparently loading data itself can be really slow.
|
|
|
|
** 5.5 Traversing a Graph
|
|
|
|
When traversing a graph, it's useful to keep track of the state of each vertex,
|
|
whether the vertex has been /undiscovered/, /discovered/, or /processed/.
|
|
|
|
** 5.6 Breadth-First Search (BFS)
|
|
|
|
This is a C implementation of BFS, however at the moment it doesn't really do
|
|
much. The important thing here is that a BFS uses a queue to visit every node in
|
|
the graph.
|
|
|
|
#+begin_src C :includes stdio.h stdlib.h stdbool.h :noweb yes
|
|
<<queue>>
|
|
<<graph>>
|
|
|
|
bool processed[QBUFSIZE];
|
|
bool discovered[QBUFSIZE];
|
|
int parent[QBUFSIZE];
|
|
|
|
void initialize_search(struct graph *g) {
|
|
for (int i = 0; i < g->nvertices; i++) {
|
|
processed[i] = discovered[i] = false;
|
|
parent[i] = -1;
|
|
}
|
|
}
|
|
|
|
void process_edge(T vertex, T edge) {}
|
|
|
|
void bfs(struct graph *g, int start) {
|
|
struct queue *q = q_create();
|
|
initialize_search(g);
|
|
q_enqueue(q, start);
|
|
discovered[start] = true;
|
|
|
|
while (!q_empty(q)) {
|
|
T v = q_dequeue(q);
|
|
processed[v] = true;
|
|
struct edgenode *p = g->edges[v];
|
|
while (p != NULL) {
|
|
T y = p->y;
|
|
if ((processed[y] == false) || g->directed) {
|
|
process_edge(v, y);
|
|
}
|
|
if (discovered[y] == false) {
|
|
q_enqueue(q, y);
|
|
discovered[y] = true;
|
|
parent[y] = v;
|
|
}
|
|
p = p->next;
|
|
}
|
|
}
|
|
|
|
free(q);
|
|
}
|
|
#+end_src
|
|
|
|
** 5.7 Applications of BFS
|
|
|
|
The one project we worked on, the "Six Degrees of Bacon", which is just a six
|
|
degrees of separation problem. However, one interesting point to the "six
|
|
degrees of separation" between all humans is that it assumes that all human
|
|
social networks are "connected", meaning that there may be vertices on the graph
|
|
that are not connected to the others and form their own little cluster. BFS
|
|
works well for this, since it's basically a shortest path problem.
|
|
|
|
Other applications include solving a Rubiks cube so in theory you should now
|
|
know enough to be able to create a solver, because each legal move up until the
|
|
solved move should be connected on the graph.
|
|
|
|
The vertex-coloring problem assigns each vertex a label or color such that no
|
|
edge links any two vertices of the same label/color, ideally using as few colors
|
|
as possible. A graph is /bipartite/ if it can be colored without conflicts with
|
|
just two colors.
|
|
|
|
** 5.8 Depth-first Search (DFS)
|
|
|
|
While BFS uses a queue, DFS uses a stack, but since recursion models a stack
|
|
already, we don't need an extra data structure. Here is the C implementation
|
|
mostly just copied from the book.
|
|
|
|
#+begin_src C :includes stdio.h stdlib.h stdbool.h :noweb yes
|
|
<<graph>>
|
|
void dfs(struct graph *g, int v) {
|
|
edgenode *p;
|
|
int y;
|
|
|
|
// No idea where this symbol comes from
|
|
// They seem to be global
|
|
if (finished) return;
|
|
|
|
discovered[v] = true;
|
|
// And these two as well
|
|
entry_time[v] = ++time;
|
|
p = g->edges[v];
|
|
while (p != NULL) {
|
|
y = p->y;
|
|
if (discovered[y] == false) {
|
|
parent[y] == v;
|
|
process_edge(v, y);
|
|
dfs(g, y);
|
|
} else if (!processed[y] || g->directed) {
|
|
process_edge(v, y);
|
|
if (finished) return;
|
|
p = p->next;
|
|
}
|
|
|
|
time = time + 1;
|
|
exit_time[v] = time;
|
|
processed[v] = true;
|
|
}
|
|
}
|
|
#+end_src
|
|
|
|
This uses a technique called Backtracking which still hasn't been explored, but
|
|
the idea is that it goes further down until it cannot process anything further,
|
|
then goes all the way back up to where it can start branching out again.
|
|
|
|
** 5.9 Applications of DFS
|
|
|
|
DFS is useful for finding articulation vertices, which are basically weak points
|
|
where removal will cause other vertices to become disconnected. It's also useful
|
|
for finding cycles.
|
|
|
|
** 5.10 DFS on Directed Graphs
|
|
|
|
The important operation here is topological sorting which is useful with
|
|
Directed Acyclic Graphs (DAGs). We can also check if a DAG is strongly
|
|
connected, meaning that we won't run into any dead ends since we cannot
|
|
backtrack. However, these graphs can be partitioned.
|
|
|
|
* Chapter 6 - Weighted Graph Algorithms
|
|
|
|
** 6.1 Minimum Spanning Trees
|
|
|
|
Weighted graphs open up a whole new universe of algorithms which tend to be a
|
|
bit more helpful in modeling real world stuff like roads. A minimum spanning
|
|
tree would have all vertices connected but uses the smallest weights for all its
|
|
edges.
|
|
|
|
Prim's Algorithm can be used to construct a spanning tree but because it is a
|
|
greedy algorithm. Kruskal's algorithm is a more efficient way to find MSTs with
|
|
the use of a /union-find/. This data structure is a /set partition/, which finds
|
|
disjointed subsets. These can also be used to solve other interesting problems;
|
|
|
|
- Maximum Spanning Trees
|
|
- Minimum Product Spanning Trees
|
|
- Minimum Bottleneck Spanning Tree
|
|
|
|
However, Steiner Tree and Low-degree Spanning Tree apparently cannot be solved
|
|
with the previous two algorithms.
|
|
|
|
** 6.2 War Story
|
|
|
|
Minimum Spanning Trees and its corresponding algorithms can help solve Traveling
|
|
Salesman Problems
|
|
|
|
** 6.3 Shortest Paths
|
|
|
|
In an unweighted graph, BFS will find the shortest path between two nodes. For
|
|
weighted graphs the shortest path might have many more edges. Dijstra's
|
|
algorithm can help us find the shortest path in a weighted path. It runs in
|
|
O(n^{2}) time. One caveat is that it does not work with negative weighted edges.
|
|
|
|
Another problem in this space is the all-pairs shortest path, and for this we
|
|
would use the O(n^{3}) Floyd-Warshall algorithm which uses an adjacency matrix
|
|
instead since we needed to construct one anyway to track all the possible pairs.
|
|
It's a useful algorithm for figuring out /transitive closure/ which is a fancy way
|
|
of asking if some vertices are reachable from a node.
|
|
|
|
*** TODO Dijstra's
|
|
#+begin_src C :includes stdio.h stdlib.h
|
|
|
|
#+end_src
|
|
|
|
|
|
** 6.4 War Story
|
|
|
|
Kids are probably too young to know what the author is even talking about with these
|
|
phone codes.
|
|
|
|
** 6.5 Network Flows and Bipartite Matching
|
|
|
|
Bipartite matching is where no two edges share a vertex. There are also residual
|
|
flow graphs which are useful for network bandwidth optimization. Flow algorithms
|
|
generally solve problems related to edge and vertex connectivity. Edmonds and
|
|
Karp runs at O(n^{3}).
|
|
|
|
*** TODO Network Flow
|
|
|
|
** 6.6 Design Graphs, Not Algorithms
|
|
|
|
Modeling problems as graphs is useful for a variety of applications; pathfinding
|
|
in games, DNA sequencing, smallest set of non-overlapping rectangles in a plain,
|
|
filename shortening, line segmentation for optical character-recognition, and
|
|
the list goes on.
|
|
|
|
* Chapter 7 - Combinatorial Search and Heuristic Methods
|
|
|
|
** 7.1 Backtracking
|
|
|
|
|