P(X | I, want) = ?that is, what is the probability of seeing a word X after a sequence of word "I want". n-gram models answer such questions by estimating the probability of sequences of n words and estimating p(<w1...wn> | <w1...wn-1>).
The objective of the learning method is to estimate p(w1...wn) given C(w1...wn).
The first method to derive the estimation p from C is to use only the data seen in the corpus. This method is called Maximum Likelihood Estimation (MLE). The MLE is in general a poor estimate, because it leaves 0 probability for all n-grams that do not appear in the actual corpus. Since the corpus contains in general a very small quantity of n-grams from the theoretical quantity of possible n-grams, MLEs are not an appropriate estimator.
The following sources present better methods for smoothing - that is, the process of deriving a good estimation given frequency counts in a data-sparse case: