Algorithms/Mathematical Background

From testwiki
Revision as of 04:08, 27 January 2008 by 82.7.116.219 (talk) (Undo revision 1094579 by 82.7.116.219 (Talk))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Computer science:AlgorithmsTOC

Before we begin learning algorithmic techniques, we take a detour to give ourselves some necessary mathematical tools. First, we cover mathematical definitions of terms that are used later on in the book. By expanding your mathematical vocabulary you can be more precise and you can state or formulate problems more simply. Following that, we cover techniques for analysing the running time of an algorithm. After each major algorithm covered in this book we give an analysis of its running time as well as a proof of its correctness

Mathematical Definitions

A set can be intuitively thought of as a collection of different objects. Such an object can be anything, including another set. The fundamental operation on a set is "member of", written "". Two sets are equal if and only if they contain the same members. The following are all sets:

{1,2}
The set containing the integers "1" and "2".
The set of all natural numbers (including 0).
The set of all integers.
The set of all rational numbers.
The set of all real numbers.
={}
The set that contains nothing, known as the empty set.

We can also describe a new set using a set comprehension of the form

{iS:P(i)}

where S is an already-defined set and P is an arbitrary predicate (something that is either true or false of everything it is applied to). You get the new set by going through all of the elements i in S (whose elements you already know), and picking out the ones for which P(i) is true. For example,

{i:imod2=0}

is the set of all even integers. Frequently in this book, we will leave out the S and expect it to be implied from the context.

An ordered n-tuple, often called just an n-tuple, is an ordered group of n objects, possibly not all of the same type. It is written "a,b,c,". The same object can appear more than once in a given n-tuple. Two n-tuples are considered equal when they have exactly the same objects in exactly the same positions. The most common n-tuple is the "ordered pair," written "a,b".

We can compare sets based on their elements. The fundamental comparisons are true when one set contains all of the members of another set. In the following definitions, A and B, and C are sets.

Comparison name Notation Mathematical definition English definition
Subset AB xAxB All members of A are also members of B. For any set A, A and AA.
Proper subset AB ABAB This comparison is used when we want to exclude the set itself.
Superset AB BA
Proper superset AB BA

There are also many operations that apply to sets.

Operation name Notation Mathematical definition English definition
Union AB {x:xAxB} The set of things that are in either A or B.
Intersection AB {x:xAxB} The set of things that are in both A and B.
Set difference AB or sometimes AB {x:xAxB} The set of things in A but not B.
Complement A1, AC, or A {xU:xA} The set of things not in A. This is only well-defined with respect to some implied universal set U.
Power set (A) or 2A {S:SA} The set of subsets of A. For any set A, (A) and A(A).
Cartesian product A×B {a,b:aAbB} The set of every pairing of an element from A with an element from B.

We can extend the Cartesian product definition to include n-ary products, defined as

A1××An={a1,,an:aiAi1in}.

Using this extension, we can also define Cartesian exponentiation:

An=A××An times.

A relation R on a set S is a subset of S×S. We often write xRy to mean x,yR. Some common relations are: "is taller than", "knows about", "is derived from", "lives near", and "<". We can classify relations by what objects they relate:

A relation R on S is when (for all a, b, and c in S)
reflexive aRa
irreflexive ¬aRa
symmetric if aRb then bRa
antisymmetric if aRb and bRa, then a=b
asymmetric if aRb then ¬bRa
transitive if aRb and bRc then aRc
intransitive if aRb and bRc then ¬aRc
a weak partial order it is reflexive, antisymmetric, and transitive
a weak total order it is a weak partial order and either aRb or bRa
a strict partial order it is irreflexive, asymmetric, and transitive
a strict total order it is a strict partial order and either aRb, bRa, or a=b
well founded there are no infinite descending chains in S

We usually use the symbol "≤" for a weak order and "<" for a strict order. The set of integers, the set of floating-point numbers, and the set of strings can all have a total ordering defined on them.

Note that none of the pairs of reflexive and irreflexive, symmetric and asymmetric, or transitive and intransitive are opposites of each other. For example, the relation {1,1,1,2} is neither reflexive nor irreflexive.

A sequence is an ordered list, ordered multiset, or an ordered array of items of the same type. The following variables are all sequences:

let A := array {"a", "tree", "that", "is", "inflexible", "will", "snap"}
let B := list {1, 4, 1, 5, 9, 2, 6, 5, 3, 5}
let C := array {}

Here, A is an array of strings, B is a list of integers, and C is an empty array. Note that B contains the values "1" twice and "5" three times.

A subsequence of a sequence is a new sequence formed by deleting some elements from the old sequence and leaving the remaining elements in the same relative order. For example,

let A_T_words:= array {"tree", "that"}
let B_evens := list {4, 2, 6}
let B_odds := list {1, 1, 5, 9, 5, 3, 5}
let B_primes := list {5, 2, 5, 3, 5}
let B_empty := list {}

Note that the empty sequence (C and B_empty, above) is a subsequence of every sequence. Also note that a sequence can have many different subsequences.

A median of a set of values is the value that separates the highest half of the values from the lowest half. To find the median, arrange all the observations from lowest value to highest value and pick the middle one. If there are an even number of values, the median can be defined as the mean of the two middle values.

The closed form of a function is a description of the function in mathematical terms that use a bounded number of well-known operations. For example, expressions using only addition, subtraction, multiplication, division, exponentiation, negation, absolute-value, and the factorial function would be considered closed form. Specifically not allowed is when the number of operations depends upon the values of the function's variable. For example, here is a non-closed form version of the function sum(n), the sum of the first n positive integers:

sum(n)=i=1ni=1+2++(n1)+n.

And, here is the closed-form version of that same function:

sum(n)=n(n+1)2.

Asymptotic Notation

In addition to correctness another important characteristic of a useful algorithm is its time and memory consumption. Time and memory are both valuable resources and there are important differences (even when both are abundant) in how we can use them.

How can you measure resource consumption? One way is to create a function that describes the usage in terms of some characteristic of the input. One commonly used characteristic of an input dataset is its size. For example, suppose an algorithm takes as input an array of n integers. We can describe the time this algorithm takes as a function f written in terms of n. For example, we might write:

f(n)=n2+3n+14

where the value of f(n) is some unit of time (in this discussion the main focus will be on time, but we could do the same for memory consumption). Rarely are the units of time actually in seconds, because that would depend on the machine itself, the system it's running, and its load. Instead, the units of time typically used are in terms of the number of some fundamental operation performed. For example, some fundamental operations we might care about are: the number of additions or multiplications needed; the number of element comparisons; the number of memory-location swaps performed; or the raw number of machine instructions executed. In general we might just refer to these fundamental operations performed as steps taken.

Is this a good approach to determine an algorithm's resource consumption? Yes and no. When two different algorithms are similar in time consumption a precise function might help to determine which algorithm is faster under given conditions. But in many cases it is either difficult or impossible to calculate an analytical description of the exact number of operations needed, especially when the algorithm performs operations conditionally on the values of its input. Instead, what really is important is not the precise time required to complete the function, but rather the degree that resource consumption changes depending on its inputs. Concretely, consider these two functions, representing the computation time required for each size of input dataset:

f(n)=n312n2+20n+110
g(n)=n3+n2+5n+5

They look quite different, but how do they behave? Let's look at a few plots of the function (f(n) is in red, g(n) in blue):

Plot of f and g, in range 0 to 5
Plot of f and g, in range 0 to 15
Plot of f and g, in range 0 to 100
Plot of f and g, in range 0 to 1000

In the first, very-limited plot the curves appear somewhat different. In the second plot they start going in sort of the same way, in the third there is only a very small difference, and at last they are virtually identical. In fact, they approach n3, the dominant term. As n gets larger, the other terms become much less significant in comparison to n3.

As you can see, modifying a polynomial-time algorithm's low-order coefficients doesn't help much. What really matters is the highest-order coefficient. This is why we've adopted a notation for this kind of analysis. We say that:

f(n)=n312n2+20n+110=O(n3)

We ignore the low-order terms. We can say that:

O(logn)O(n)O(n)O(nlogn)O(n2)O(n3)O(2n)

This gives us a way to more easily compare algorithms with each other. Running an insertion sort on n elements takes steps on the order of O(n2). Merge sort sorts in O(nlogn) steps. Therefore, once the input dataset is large enough, merge sort is faster than insertion sort.

In general, we write

f(n)=O(g(n))

when

c>0,n0>0,nn0:0f(n)cg(n).

That is, f(n)=O(g(n)) holds if and only if there exists some constants c and n0 such that for all n>n0 f(n) is positive and less than or equal to cg(n).

Note that the equal sign used in this notation describes a relationship between f(n) and g(n) instead of reflecting a true equality. In light of this, some define Big-O in terms of a set, stating that:

f(n)O(g(n))

when

f(n){f(n):c>0,n0>0,nn0:0f(n)cg(n)}.

Big-O notation is only an upper bound; these two are both true:

n3=O(n4)
n4=O(n4)

If we use the equal sign as an equality we can get very strange results, such as:

n3=n4

which is obviously nonsense. This is why the set-definition is handy. You can avoid these things by thinking of the equal sign as a one-way equality, i.e:

n3=O(n4)

does not imply

O(n4)=n3

Always keep the O on the right hand side.

Big Omega

Sometimes, we want more than an upper bound on the behavior of a certain function. Big Omega provides a lower bound. In general, we say that

f(n)=Ω(g(n))

when

c>0,n0>0,nn0:0cg(n)f(n).

i.e. f(n) = O(g(n)) if and only if there exist constants c and n0 such that for all n>n0 f(n) is positive and greater than or equal to cg(n).

So, for example, we can say that

n22n=Ω(n2) - (c=1/2, n0=4) or
n22n=Ω(n) - (c=1, n0=3),

but it is false to claim that

n22n=Ω(n3).

Big Theta

When a given function is both O(g(n)) and Ω(g(n)), we say it is Θ(g(n)), and we have a tight bound on the function. A function f(n) is Θ(g(n)) when

c1>0,c2>0,n0>0,nn0:0c1g(n)f(n)c2g(n),

but most of the time, when we're trying to prove that a given f(n)=Θ(g(n)), instead of using this definition, we just show that it is both O(g(n)) and Ω(g(n)).

Little-O and Omega

When the asymptotic bound is not tight, we can express this by saying that f(n)=o(g(n)) or f(n)=ω(g(n)). The definitions are:

f(n) is o(g(n)) iff c>0,n0>0,nn0:0f(n)<cg(n) and
f(n) is ω(g(n)) iff c>0,n0>0,nn0:0cg(n)<f(n).

Note that a function f is in o(g(n)) when for any coefficient of g, g eventually gets larger than f, while for O(g(n)), there only has to exist a single coefficient for which g eventually gets at least as big as f.

[TODO: define what T(n,m) = O(f(n,m)) means. That is, when the running time of an algorithm has two dependent variables. Ex, a graph with n nodes and m edges. It's important to get the quantifiers correct!]

Algorithm Analysis: Solving Recurrence Equations

Merge sort of n elements: T(n)=2*T(n/2)+c(n) This describes one iteration of the merge sort: the problem space n is reduced to two halves (2*T(n/2)), and then merged back together at the end of all the recursive calls (c(n)). This notation system is the bread and butter of algorithm analysis, so get used to it.

There are some theorems you can use to estimate the big Oh time for a function if its recurrence equation fits a certain pattern.

[TODO: write this section]

Substitution method

Formulate a guess about the big Oh time of your equation. Then use proof by induction to prove the guess is correct.

[TODO: write this section]

Summations

[TODO: show the closed forms of commonly needed summations and prove them]

Draw the Tree and Table

This is really just a way of getting an intelligent guess. You still have to go back to the substitution method in order to prove the big Oh time.

[TODO: write this section]

The Master Theorem

Consider a recurrence equation that fits the following formula:

T(n)=aT(nb)+O(nk)

for a ≥ 1, b > 1 and k ≥ 0. Here, a is the number of recursive calls made per call to the function, n is the input size, b is how much smaller the input gets, and k is the polynomial order of an operation that occurs each time the function is called (except for the base cases). For example, in the merge sort algorithm covered later, we have

T(n)=2T(n2)+O(n)

because two subproblems are called for each non-base case iteration, and the size of the array is divided in half each time. The O(n) at the end is the "conquer" part of this divide and conquer algorithm: it takes linear time to merge the results from the two recursive calls into the final result.

Thinking of the recursive calls of T as forming a tree, there are three possible cases to determine where most of the algorithm is spending its time ("most" in this sense is concerned with its asymptotic behaviour):

  1. the tree can be top heavy, and most time is spent during the initial calls near the root;
  2. the tree can have a steady state, where time is spread evenly; or
  3. the tree can be bottom heavy, and most time is spent in the calls near the leaves

Depending upon which of these three states the tree is in T will have different complexities:

The Master Theorem

Given T(n)=aT(nb)+O(nk) for a ≥ 1, b > 1 and k ≥ 0:

  • If a<bk, then T(n)=O(nk)  (top heavy)
  • If a=bk, then T(n)=O(nklogn) (steady state)
  • If a>bk, then T(n)=O(nlogba) (bottom heavy)

For the merge sort example above, where

T(n)=2T(n2)+O(n)

we have

a=2,b=2,k=1bk=2

thus, a=bk and so this is also in the "steady state": By the master theorem, the complexity of merge sort is thus

T(n)=O(n1logn)=O(nlogn).

Amortized Analysis

[Start with an adjacency list representation of a graph and show two nested for loops: one for each node n, and nested inside that one loop for each edge e. If there are n nodes and m edges, this could lead you to say the loop takes O(nm) time. However, only once could the innerloop take that long, and a tighter bound is O(n+m).]



Template:Computer science:AlgorithmsTOC