If a problem can be verified in polynomial time, then it's in the NP class. Another definition: if the language of a problem can be accepted by some polynomial-bounded non-deterministic Turing machine. If the language can be accepted by some polynomial-bounded deterministic Turing machine, then it's in P. Thus, P \in NP.
Apparently finding the solution itself can be hard, although the verification is often trivial.
Reducibility: If A reduces to B in polynomial time. B can be harder? At least as hard.
like SAT <=p 3SAT. 3SAT is harder. If we can solve 3SAT, we solve all SAT problems.
If all problems in NP can be reduced to A, then A is NP-hard. A doesn't have to a NP problem. So there are some NP-hard problems that are not in NP? If we solve a NP-hard problem, we solve all NP problems, including those NP-complete ones.
Seems not all NP-hard problems are equally hard. Those are not in NP are harder?
If we can show a problem is in NP, and it's NP-hard, then it's a NP-complete problem. seems NP-Complete are the hardest problems in NP? And also are all NP-complete problems equivalent?
So next mission for scientist: find one NP-complete problem and solve it!
Saturday, May 1, 2010
Sunday, April 25, 2010
Expectation Maximization
EM algorithm is a way of optimization that deals with hidden data. It consists of two steps:
1. Expectation: fill-in the hidden values so the function (e.g. likelihood) can be computed.
2. Maximization: maximize the function above
K-means:
choose some seed centers
classify: assign each point's membership. (M)
recenter: recalculate the center. (E)
k-means optimizes \min_mu \min_C \sum_i |\mu_C(i) - x_i|^2. C(i) is membership center for point x_i. \mu_j is the center for class j. k-means converges because it increases this function in each iteration (or it stops increasing when converged)
Gaussian Mixture:
choose some seed center. so we know which point comes from which component \mu
if we know \mu for each point, then the likelihood can be computed! (E)
then maximize it! (M) we will get a set of new \mu. Google to find the updating rule.
GMM is soft clustering since each point belongs to some component only with a probability. K-means is hard clustering.
EM actually maximizes the lower bound of the marginal likelihood.
Chengxiang Zhai has a nice article on EM's application to language modeling.
1. Expectation: fill-in the hidden values so the function (e.g. likelihood) can be computed.
2. Maximization: maximize the function above
K-means:
choose some seed centers
classify: assign each point's membership. (M)
recenter: recalculate the center. (E)
k-means optimizes \min_mu \min_C \sum_i |\mu_C(i) - x_i|^2. C(i) is membership center for point x_i. \mu_j is the center for class j. k-means converges because it increases this function in each iteration (or it stops increasing when converged)
Gaussian Mixture:
choose some seed center. so we know which point comes from which component \mu
if we know \mu for each point, then the likelihood can be computed! (E)
then maximize it! (M) we will get a set of new \mu. Google to find the updating rule.
GMM is soft clustering since each point belongs to some component only with a probability. K-means is hard clustering.
EM actually maximizes the lower bound of the marginal likelihood.
Chengxiang Zhai has a nice article on EM's application to language modeling.
Thursday, April 22, 2010
Logistic Regression vs. Gaussian Naive Bayes
P(Y=1|X) = 1 / 1+exp(w_0+\sum(w_i X_i))
P(Y=0|X) = exp(w_0+\sum(w_i X_i)) / 1+exp(w_0+\sum(w_i X_i))
log(P(Y=0|X)/P(Y=1|X)) = w_0 +\sum(w_i X_i). So LR has a linear decision boundary.
It's easy to show that LR and GNB
LR doesn't require independence assumption, while Naive Bayesian does. When the data assumption doesn't hold, LR often outperforms GNB.
When feature space's assumption holds, LR and NB learn an identical classifier when # training examples approaches infinity.
GNB converges towards it asymptotic accuracy faster than LR:
When dataset is small, GNB > LR
When dataset is big, LR > GNR
LR estimates P(Y|X) directly, hence discriminative. NB estimates P(X|Y) and P(Y), hence generative.
We use MLE to estimate parameters of GNB, but why use gradient to LR? Because we don't want to assume GNB's feature independence.
P(Y=0|X) = exp(w_0+\sum(w_i X_i)) / 1+exp(w_0+\sum(w_i X_i))
log(P(Y=0|X)/P(Y=1|X)) = w_0 +\sum(w_i X_i). So LR has a linear decision boundary.
It's easy to show that LR and GNB
LR doesn't require independence assumption, while Naive Bayesian does. When the data assumption doesn't hold, LR often outperforms GNB.
When feature space's assumption holds, LR and NB learn an identical classifier when # training examples approaches infinity.
GNB converges towards it asymptotic accuracy faster than LR:
When dataset is small, GNB > LR
When dataset is big, LR > GNR
LR estimates P(Y|X) directly, hence discriminative. NB estimates P(X|Y) and P(Y), hence generative.
We use MLE to estimate parameters of GNB, but why use gradient to LR? Because we don't want to assume GNB's feature independence.
Overfitting
Overfitting refers to the behavior when the model learned has a very poor representation of the original underlying function, i.e. it becomes so complicated, even fits the outliers, and doesn't generalize, e.g. to work well on other datasets.
Typically the more parameters the model has, the better it fits the training data, thus overfitting occurs. However, overfitting problems becomes less severe as the size of the data set increases since the larger data set is, the more complex model we can afford. Regularization is one way of control overfitting.
Overfitting problem is a general property of maximum likelihood, and Bayesian approach can effectively avoid overfitting.
Typically the more parameters the model has, the better it fits the training data, thus overfitting occurs. However, overfitting problems becomes less severe as the size of the data set increases since the larger data set is, the more complex model we can afford. Regularization is one way of control overfitting.
Overfitting problem is a general property of maximum likelihood, and Bayesian approach can effectively avoid overfitting.
Tuesday, April 20, 2010
MLE vs. MAP vs. biased estimator
MLE: you maximize p(x_1,x_2,...x_n)
MAP: you maximize p(u|x_1,x_2,...,x_n)
Mean:
for Gaussian,
Unbiased estimator:
if expected value of the estimate == true value
MAP: we want to maximize p(\mu|X) = p(X|\mu)p(\mu)/p(X) = \arg max p(X|\mu)p(\mu)
MAP with a uniform (uninformative) prior reduces to MLE estimator. i.e p(\mu|X) = \arg max p(x|\mu)
MAP: you maximize p(u|x_1,x_2,...,x_n)
Mean:
\mu =
for Gaussian,
\mu_{\tiny MLE} = \frac{\sum(x_i)}{N}
\sigma_{\tiny MLE}^2 = \frac{\sum(x_i - \mu_{MLE})^2}{N}
Unbiased estimator:
if expected value of the estimate == true value
E[\mu_{\tiny MLE}] = \frac{E[\sum(x_i])}{N}=\frac{\sum(E[x_i])}{N}=\mu. YES
E[\sigma_{\tiny MLE}] NO
MAP: we want to maximize p(\mu|X) = p(X|\mu)p(\mu)/p(X) = \arg max p(X|\mu)p(\mu)
MAP with a uniform (uninformative) prior reduces to MLE estimator. i.e p(\mu|X) = \arg max p(x|\mu)
Monday, April 19, 2010
Relevance Feedback
In a document retrieval system, when user issues an initial query, the list of documents system returns often consists of relevant and irrelevant documents. After user provides some feedback on the retrieval relevance, explicitly or implicitly, system is able to return a better list of documents by formulating a new query, with most relevant ones ranked first and irrelevant on the bottom.
The theory is try to maximize the similarity between query and relevant documents, while minimizing the similarity between query and irrelevant documents.
How to generate a new query: 1. add new terms. 2. reweight query terms.
Rocchio algorithm is an example of incorporating feedback to the modified query. The effect is to move query towards the centroid of relevant documents, and move away from the centroid irrelevant documents.
Users are often reluctant to provide feedback to prolong the search interaction. Pseudo-feedback
The theory is try to maximize the similarity between query and relevant documents, while minimizing the similarity between query and irrelevant documents.
How to generate a new query: 1. add new terms. 2. reweight query terms.
q = \arg max_q [sim(q,C_r)-sim(q,C_nr)]
Rocchio algorithm is an example of incorporating feedback to the modified query. The effect is to move query towards the centroid of relevant documents, and move away from the centroid irrelevant documents.
Users are often reluctant to provide feedback to prolong the search interaction. Pseudo-feedback
Saturday, April 10, 2010
strange C problem
void foo() {
int tmp = 1;
int tmp;
}
See if it compiles.
For me, no warning. no error. no nothing. but also no object file generated. doesn't gcc check symbol table for duplicate definition?
int tmp = 1;
int tmp;
}
See if it compiles.
For me, no warning. no error. no nothing. but also no object file generated. doesn't gcc check symbol table for duplicate definition?
Subscribe to:
Posts (Atom)