Notes-TheAlgorithmDesignManual/TheAlgorithmDesignManual.org

360 lines
10 KiB
Org Mode

#+TITLE: Notes & Exercises: The Algorith Design Manual
#+AUTHOR: Joseph Ferano
#+OPTIONS: ^:{}
* Chapter 1
** 1.1 Robots
An /algorithm/ is a procedure that takes any of the possible input instances
and transforms it to the desired output.
The Robot arm problem is presented where it is trying to solder contact points
and visit all points in the shortest path possible.
The first algorithm considered is ~NearestNeighbor~. However this is a naïve, and
the arm hopscotches around
Next we consider ~ClosestPair~, but that too misses in certain instances.
Next is ~OptimalTSP~, which will always give the correct result because it
enumerates all possible combinations and return the one with the shortest
path. For 20 points however, the algorithm grows at a right of O(n!). TSP stands
for Traveling Salesman Problem.
*** TODO Implement NearestNeighbor
*** TODO Implement ClosestPair
*** TODO Implement OptimalTSP for N < 8
** 1.2 Right Jobs
Here we are introduced to the Movie Scheduling Problem where we try to pick the
largest amount of mutually non-overlapping movies an actor can pick to maximize
their time. Algorithms considered are ~EarliestJobFirst~ to start as soon as
possible, and then ~ShortestJobFirst~ to be done with it the quickest, but both
fail to find optimal solutions.
~ExhaustiveScheduling~ grows at a rate of O(2^{n}) which is much better than O(n!) as
in the previous problem. Finally ~OptimalScheduling~ improves efficiency by first
removing candidates that are overlapping such that it doesn't even compare them.
** 1.3 Correctness
It's important to be clear about the steps in pseudocode when designing
algorithms on paper. There are important things to consider about algorithm
correctness;
- Verifiability
- Simplicity
- Think small
- Think exhaustively
- Hunt for the weakness
- Go for a tie
- Seek extremes
Other tecniques include *Induction*, *Recursion*, and *Summations*.
** 1.4 Modeling
Most algorithms are designed to work on rigorously defined abstract
structures. These fundamental structures include;
- Permutations
- Subsets
- Trees
- Graphs
- Points
- Polygons
- Strings
** 1.5-1.6 War Story about Psychics
* Chapter 2
** 2.1 RAM Model of Computation
This is a simpler kind of Big Oh where
- Each simple operation is 1 step
- Loops and Subroutines are composition of simple operations
- Each memory access is one time step
Like flat earth theory, in practice we use it when engineering certain
structures because we don't take into account the curvature of the Earth.
We can already apply the concept of worst, average, and best case to this model.
** 2.2 Big Oh
The previous model often requires concrete implementations to actually measure
correctly, so instead Big Oh gives us a better, simpler framework for discussing
the relative performance between algorithms. It ignores factors that don't
impact how algorithms scale.
** 2.3 Growth Rates and Dominance Relations
These are the functions that occur in algorithm analyses;
- *Constant O(1)*
Hashtable look up, array look up, consing a list
- *Logarithmic O(log n)*
Binary Search
- *Linear O(n)*
Iterating over a list
- *Superlinear O(n log n)*
Quicksort and Mergesort
- *Quadratic* O(n^{2})
Insertion Sort and Selection Sort
- *Cubic* O(n^{3})
Some dynamic programming problems
- *Exponential* O(C^{n}^{}) *c for any constant c > 1*
Enumerate all subsets
- *Factorial O(n!)*
Generating all permutations or orderings
*Notes*:
- O(n!) algorithms become useless for anything n >= 20
- O(2^{n}) algorithms become impractical for anything n > 40
- O(n^{2}^{}) algorithms start deteriorating after n > 10,000, a million is hopeless
- O(n^{2}^{}) and O(n log n) Are fine up to 1 billion
** 2.4 Working with Big Oh
Apparently you can do arithmetic on the Big Oh functions
** 2.5 Efficiency
*** Selection Sort
**** C
#+begin_src C :includes stdio.h
void print_nums(int *nums, int length) {
for (int i = 0; i < length; i++) {
printf("%d,", nums[i]);
}
printf("\n");
}
void selection_sort(int *nums, int length) {
int i, j;
int min_idx;
for (i = 0; i < length; i++) {
print_nums(nums, length);
min_idx = i;
for (j = i+1; j < length; j++) {
if (nums[j] < nums[min_idx]) {
min_idx = j;
}
}
int temp = nums[min_idx];
nums[min_idx] = nums[i];
nums[i] = temp;
}
}
int nums[9] = { 2, 4, 9, 1, 3, 8, 5, 7, 6 };
selection_sort(nums, 9);
#+end_src
#+RESULTS:
| 2 | 4 | 9 | 1 | 3 | 8 | 5 | 7 | 6 | |
| 1 | 4 | 9 | 2 | 3 | 8 | 5 | 7 | 6 | |
| 1 | 2 | 9 | 4 | 3 | 8 | 5 | 7 | 6 | |
| 1 | 2 | 3 | 4 | 9 | 8 | 5 | 7 | 6 | |
| 1 | 2 | 3 | 4 | 9 | 8 | 5 | 7 | 6 | |
| 1 | 2 | 3 | 4 | 5 | 8 | 9 | 7 | 6 | |
| 1 | 2 | 3 | 4 | 5 | 6 | 9 | 7 | 8 | |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | 8 | |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
*** Insertion Sort
**** C
#+begin_src C :includes stdio.h
void insertion_sort(int *nums, int len) {
int i, j;
for (i = 1; i < len; i++) {
j = i;
while (nums[j] < nums[j -1] && j > 0) {
int temp = nums[j];
nums[j] = nums[j - 1];
nums[j - 1] = temp;
j--;
}
}
}
int nums[8] = {1,4,5,2,8,3,7,9};
insertion_sort(nums, 8);
for (int i = 0; i < 8; i++) {
printf("%d", nums[i]);
}
#+end_src
#+RESULTS:
: 12345789
*** TODO String Pattern Matching
*** TODO Matrix Multiplication
** 2.6 Logarithms
Logarithms are the inverse of exponents. Binary search is great for sorted
lists. There are applications related to fast exponentiation, binary trees,
harmonic numbers, and criminal sentencing.
** 2.7 Properties of Logarithms
Common bases for logarithms include 2, /e/, and 10. The base of the logarithm has
no real impact on the growth rate; log_{2} and log_{3} are roughly equivalent.
** 2.8 War Story Pyramids
Cool story bro
** 2.9 Advanced Analysis
Some advanced stuff
- *Inverse Ackerman's Function*
Union-Find data structure
- *log log n*
Binary search on a sorted array of only log n items
- *log n / log log n*
- log^{2} n
- \sqrt{,}n
There are also limits and dominance relations
* Chapter 3
** 3.1 Contiguous vs Linked Data Structures
Advantages of Arrays
- Constant-time access given the index
- Space efficiency
- Memory locality
Downsides is that they don't grow but dynamic arrays fix this by allocating a
new bigger array when needed.
Advantages of Linked Structures
- No overflow, can keep growing
- Insertions/Deletions are simpler
- A collection of pointers are lighter than contiguous data
However, pointers require extra space for storing pointer fields
** 3.2 Stacks and Queues
*** Stacks
/(PUSH, /POP/) LIFO, useful in executing recursive algorithms.
*** Queues
(/ENQUEUE/, /DEQUEUE/) FIFO, useful for breadth-first searches in graphs.
** 3.3 Dictionaries
Not just hashtables but anything that can provide access to data by
content. Some dictionaries implement trees instead of hashing. Both contiguous
and linked structures can be used with tradeoffs between them.
** 3.4 Binary Search Trees
BSTs have a parent and two child nodes; left and right. They support insertion,
deletion, traversal. Interestingly, Min and Max can be calculated by seeking the
leftmost and rightmost node respectively, provided the tree is balanced. BSTs
can have good performance for most cases so long as they remain balanced. O(h)
refers to the time being the height of the BST.
** 3.5 Priority Queues
They allow new elements to enter a system at arbitrary intervals.
** 3.6 War Story
Rather than storing all of the vertices of a mesh, you can share them between
the different triangles, but connecting all vertices requires visiting each
vertice once, a Hamiltonian path, but that's NP-Complete. Using a greedy
heuristic where it tries to always grab the best possible thing first. Then
using a priority queue, they were able to reduce the running time by several
orders of magnitude compared to the naïve approach.
** 3.7 Hasing and Strings
Take a map to a big int, use modulo to spin around, and if /m/ is a large prime
you'll get fairly uniform distribution. The two main ways to solve collisions
are /Chaining/ and /Open Addressing/. Chaining is where each bucket has a linked
list and collisions are appended. Open addressing looks for adjacent empty
buckets.
Hashing is also useful when dealing strings, in particular, substring pattern
matching. Overlaying pattern /p/ over every position in text /t/ would result in
O(m*n). With hashing, you can hash the slices of /t/ and compare them to /p/, and
get slower growth. This is called the *Rabin-Karp algorithm*. While
false-positives may occur, a good hashing function would avoid this.
Hashing is so important Yahoo! employs them extensively.
** 3.8 Specialized Data Structures
These include;
- String
Characters in an array
- Geometric
Collection of points and regions/polygons
- Graph
Using adjacency matrices
- Set
Dicionaries and bit vectors
** 3.9 War Story
They were trying to implement sequencing by hybridization (SBH), but ran into
issues when they used a BST. Then they tried a hashtable, then a trie. Finally
what worked was a compressed suffix tree.
** Exercises
*** 3.42
Reverse the words in a sentence—that is, “My name is Chris” becomes “Chris is
name My.” Optimize for time and space.
#+begin_src C :includes stdlib.h stdio.h string.h
void reverse_word(char *string, int length) {
for (int i = 0; i < length / 2; i++) {
char temp = string[i];
string[i] = string[length - 1 - i];
string[length - 1 - i] = temp;
}
}
void reverse_words(char *string, int length) {
printf("Before: %s\n", string);
reverse_word(string, length);
printf("After: %s\n", string);
int start = 0;
for (int i = 0; i < length; i++) {
if (string[i] == ' ' || i == length - 1) {
if (i == length - 1) i++;
reverse_word(&string[start], i - start);
start = i + 1;
}
}
}
char str[] = "My name is Chris";
reverse_words(str, strlen(str));
printf("Final: %s\n", str);
#+end_src
#+RESULTS:
| Before: | My | name | is | Chris |
| After: | sirhC | si | eman | yM |
| Final: | Chris | is | name | My |