Project ideas
Note: this list is not exhaustive and you're welcome to suggest your own project ideas (some of you already have). For projects with a sufficiently high work load, I'll allow working in pairs.
- Card shuffling. Many possibilities here; one example: how many shuffles are necessary to be "close" to the uniform distribution on all card orders? See here and here for some beautiful theory and project ideas.
- Properties of random permutations. Draw a permutation on n elements uniformly at random from the n! possibilities. What is the expected length of the longest cycle? What about the longest increasing subsequence? What is the expected size of the permutation subgroup generated by two random permutations? See the names in boxes problem and solution (#1).
- Geometric random walks. How do you uniformly sample a point inside a high-dimensional sphere? Ellipsoid? Simplex? Other convex body?..
- Gibbs sampling
- Rate of convergence in Central Limit Theorem. We all know that if $X_1,X_2,\ldots,X_n$ is a sequence of independent, identically distributed random variables,
with common mean $\mu$
and common variance $\sigma^2$,
then their centered, normalized sum,
$$ Z_n = \frac{ \sum_{i=1}^n (X_i - \mu) }{\sigma\sqrt{n}}$$converges in distribution to the standard normal distribution $N(0,1)$. But how rapidly does $Z_n$ converge to $N(0,1)$ – what's the convergence rate, as a function of $n$?
- The Chernoff bound is a simple and powerful concentration inequality for binary random variables, but it requires their independence. What if the $X_i$ form a Markov chain?
- Recall the coin experiment we did in the first lesson. For $n$ tosses of a fair coin, define the random variable $L_n$ as the longest contiguous sequence of heads. How does $L_n$ behave? What about for unfair (biased) coins? What if the coin tosses form a Markov chain? Define a related random variable $R_{n,k}$ to be the number of non-overlapping contiguous sequences of heads of length $k$ out of $n$ tosses. How does this random variable behave?
- Properties of random deterministic finite-state automata (DFAs). Let's draw a random DFA $M$ on $n$ states by letting each state, upon reading the letter $\sigma$, transition to each of the $n$ states uniformly at random. In addition, each state is labeled as accepting or rejecting independently with probability $1/2$, and state 1 is always the starting state. Having defined a random DFA model, we can ask all kinds of questions about it. For an $n$-state DFA drawn as above, what is the expected size of the equivalent minimal DFA? What distribution does this induce on languages $\mathcal{L}_n$ – the family of regular languages recognized by DFAs with at most $n$ states? Define a metric on the DFAs as follows: for two machines $A,B$, their distance is defined to be the size of the minimal DFA accepting their symmetric difference, minus 1. What is the average distance between $n$-state DFAs?
- Properties of random graphs – a very rich and beautiful area. Many natural questions can be explored: What is the expected number of connected components? What is the size of the largest one? Expected diameter, girth, etc? What about models other than the Erdős–Rényi model?
- Random walks and percolation
- Random matrices and their eigenvalues (open questions pop up all the time)
- Various cover time problems
- Planar geometry: expected area of random triangle, expected area/degree of convex hull
Requirements and grading criteria
Projects are due on 30 Jul 2013. A project will consist of a programming component (your sampling program) and a paper component (your report). There is no length requirement, but 10 pages should more than suffice for a typical report. The report should follow the following outline:
- Problem statement: what problem are you solving? E.g., in the case of card shuffling, the problem is to obtain a "perfect shuffle". So you need to state this, define perfect shuffle, and explain where the technical difficulties lie (in the case of shuffling, the state space has 52! elements). Note: to receive full credit here, you'll need a precise mathematical statement of the problem. Define your probability space, the distribution you want to sample from, etc.
- Methods: Give a precise mathematical description of the algorithm you are using to solve the problem in (1). Anybody who reads you paper should be able to reproduce your numerical results.
- Results: Present your simulation results in a visually comprehensible way (neat tables, charts, graphs, plots). Be sure to accompany each figure with a clear description of what it illustrates.
- Analysis: How "good" was your simulation – i.e., how close was it to the distribution you actually wanted to sample from? You'll need to define precisely your performance measure (measure of "goodness") and explain how you are computing it. In some cases, you should be able to provide a mathematical analysis of the results (e.g., "if I run my algorithm for 100 steps, then with probability at least .98 I'll be drawing from a distribution that is within .03 of the uniform distribution, under the total variation metric"). We'll discuss the requirements for each project on an individual basis.
See this paper for an example of a report with a precise problem statement (Sec. 1), methods (Sec. 3), results (pages 11-15), and analysis (Sec. 1 and 2). Note that this paper also provides a detailed background literature survey, which I am not asking you to do.
I will also ask to see your code, which will be graded on readability, elegance, and efficiency.
Citing sources
This should be obvious but I'll say it anyway: please cite all sources in your work. If you got an idea or a solution from a person, a book, an article, an internet page – please say so. A failure to do this is plagiarism, and is a very serious offense.It is not sufficient to just give a list of sources at the end. You need to acknowledge each idea, formula, quotation that is not your original work. Any such "borrowed" idea needs to be clearly acknowledged the first time it appears in your report.
Please use your original words and don't copy more than a sentence or so of verbatim text. I am interested in your code and your analysis – not your ability to transcribe text verbatim.
Each submitted project should be accompanied by a declaration of originality.