Sunday, April 25, 2010

Expectation Maximization

EM algorithm is a way of optimization that deals with hidden data. It consists of two steps:

1. Expectation: fill-in the hidden values so the function (e.g. likelihood) can be computed.
2. Maximization: maximize the function above

K-means:
choose some seed centers

classify: assign each point's membership. (M)
recenter: recalculate the center. (E)

k-means optimizes \min_mu \min_C \sum_i |\mu_C(i) - x_i|^2. C(i) is membership center for point x_i. \mu_j is the center for class j. k-means converges because it increases this function in each iteration (or it stops increasing when converged)

Gaussian Mixture:

choose some seed center. so we know which point comes from which component \mu

if we know \mu for each point, then the likelihood can be computed! (E)
then maximize it! (M) we will get a set of new \mu. Google to find the updating rule.

GMM is soft clustering since each point belongs to some component only with a probability. K-means is hard clustering.

EM actually maximizes the lower bound of the marginal likelihood.

Chengxiang Zhai has a nice article on EM's application to language modeling.

0 comments: