<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1244231074627122549</id><updated>2011-07-28T22:02:24.425-07:00</updated><title type='text'>Machine Learning and Algorithms</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>19</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-5936769626019560267</id><published>2010-05-01T14:35:00.000-07:00</published><updated>2010-05-01T15:07:05.121-07:00</updated><title type='text'>NP, NP-complete and NP-hard</title><content type='html'>If a problem can be verified in polynomial time, then it's in the NP class. Another definition: if the language of a problem can be accepted by some polynomial-bounded non-deterministic Turing machine. If the language can be accepted by some polynomial-bounded deterministic Turing machine, then it's in P. Thus, P \in NP.&lt;br /&gt;&lt;br /&gt;Apparently finding the solution itself can be hard, although the verification is often trivial. &lt;br /&gt;&lt;br /&gt;Reducibility: If A reduces to B in polynomial time. B can be harder? At least as hard.&lt;br /&gt;like SAT &lt;=p 3SAT. 3SAT is harder. If we can solve 3SAT, we solve all SAT problems.&lt;br /&gt;&lt;br /&gt;If all problems in NP can be reduced to A, then A is NP-hard. A doesn't have to a NP problem. So there are some NP-hard problems that are not in NP? If we solve a NP-hard problem, we solve all NP problems, including those NP-complete ones.&lt;br /&gt;&lt;br /&gt;Seems not all NP-hard problems are equally hard. Those are not in NP are harder?&lt;br /&gt;&lt;br /&gt;If we can show a problem is in NP, and it's NP-hard, then it's a NP-complete problem. seems NP-Complete are the hardest problems in NP? And also are all NP-complete problems equivalent?&lt;br /&gt;&lt;br /&gt;So next mission for scientist: find one NP-complete problem and solve it!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-5936769626019560267?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/5936769626019560267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=5936769626019560267' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/5936769626019560267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/5936769626019560267'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/05/np-np-complete-and-np-hard.html' title='NP, NP-complete and NP-hard'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-2718494263940687169</id><published>2010-04-25T15:49:00.001-07:00</published><updated>2010-04-25T18:59:44.758-07:00</updated><title type='text'>Expectation Maximization</title><content type='html'>EM algorithm is a way of optimization that deals with hidden data. It consists of two steps:&lt;br /&gt;&lt;br /&gt;1. Expectation: fill-in the hidden values so the function (e.g. likelihood) can be computed.&lt;br /&gt;2. Maximization: maximize the function above&lt;br /&gt;&lt;br /&gt;K-means:&lt;br /&gt;  choose some seed centers&lt;br /&gt;&lt;br /&gt;  classify: assign each point's membership. (M)&lt;br /&gt;  recenter: recalculate the center. (E)&lt;br /&gt;&lt;br /&gt;k-means optimizes \min_mu \min_C \sum_i |\mu_C(i) - x_i|^2. C(i) is membership center for point x_i. \mu_j is the center for class j. k-means converges because it increases this function in each iteration (or it stops increasing when converged)&lt;br /&gt;&lt;br /&gt;Gaussian Mixture:&lt;br /&gt;&lt;br /&gt;   choose some seed center. so we know which point comes from which component \mu&lt;br /&gt;&lt;br /&gt;   if we know \mu for each point, then the likelihood can be computed! (E)&lt;br /&gt;   then maximize it! (M) we will get a set of new \mu. Google to find the updating rule.&lt;br /&gt;&lt;br /&gt;GMM is soft clustering since each point belongs to some component only with a probability. K-means is hard clustering.&lt;br /&gt;&lt;br /&gt;EM actually maximizes the lower bound of the marginal likelihood.&lt;br /&gt;&lt;br /&gt;Chengxiang Zhai has a nice article on EM's application to language modeling.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-2718494263940687169?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/2718494263940687169/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=2718494263940687169' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2718494263940687169'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2718494263940687169'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/expectation-maximization.html' title='Expectation Maximization'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-6705210691922906761</id><published>2010-04-22T22:32:00.000-07:00</published><updated>2010-04-22T23:07:24.478-07:00</updated><title type='text'>Logistic Regression vs. Gaussian Naive Bayes</title><content type='html'>P(Y=1|X) = 1 / 1+exp(w_0+\sum(w_i X_i))&lt;br /&gt;P(Y=0|X) = exp(w_0+\sum(w_i X_i)) / 1+exp(w_0+\sum(w_i X_i))&lt;br /&gt;&lt;br /&gt;log(P(Y=0|X)/P(Y=1|X)) = w_0 +\sum(w_i X_i).  So LR has a linear decision boundary.&lt;br /&gt;&lt;br /&gt;It's easy to show that LR and GNB&lt;br /&gt;&lt;br /&gt;LR doesn't require independence assumption, while Naive Bayesian does. When the data assumption doesn't hold, LR often outperforms GNB. &lt;br /&gt;&lt;br /&gt;When feature space's assumption holds, LR and NB learn an identical classifier when # training examples approaches infinity.&lt;br /&gt;&lt;br /&gt;GNB converges towards it asymptotic accuracy faster than LR:&lt;br /&gt;    When dataset is small, GNB &gt; LR&lt;br /&gt;    When dataset is big, LR &gt; GNR&lt;br /&gt;&lt;br /&gt;LR estimates P(Y|X) directly, hence discriminative. NB estimates P(X|Y) and P(Y), hence generative.&lt;br /&gt;&lt;br /&gt;We use MLE to estimate parameters of GNB, but why use gradient to LR? Because we don't want to assume GNB's feature independence.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-6705210691922906761?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/6705210691922906761/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=6705210691922906761' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/6705210691922906761'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/6705210691922906761'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/logistic-regression-vs-gnb.html' title='Logistic Regression vs. Gaussian Naive Bayes'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-2139744541872487487</id><published>2010-04-22T21:14:00.000-07:00</published><updated>2010-04-22T21:22:48.116-07:00</updated><title type='text'>Overfitting</title><content type='html'>Overfitting refers to the behavior when the model learned has a very poor representation of the original underlying function, i.e. it becomes so complicated, even fits the outliers, and doesn't generalize, e.g. to work well on other datasets.&lt;br /&gt;&lt;br /&gt;Typically the more parameters the model has, the better it fits the training data, thus overfitting occurs. However, overfitting problems becomes less severe as the size of the data set increases since the larger data set is, the more complex model we can afford. Regularization is one way of control overfitting.&lt;br /&gt;&lt;br /&gt;Overfitting problem is a general property of maximum likelihood, and Bayesian approach can effectively avoid overfitting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-2139744541872487487?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/2139744541872487487/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=2139744541872487487' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2139744541872487487'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2139744541872487487'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/overfitting.html' title='Overfitting'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-9187256401654634809</id><published>2010-04-20T13:50:00.000-07:00</published><updated>2010-04-20T17:41:43.030-07:00</updated><title type='text'>MLE vs. MAP vs. biased estimator</title><content type='html'>MLE: you maximize p(x_1,x_2,...x_n)&lt;br /&gt;MAP: you maximize p(u|x_1,x_2,...,x_n)&lt;br /&gt;&lt;br /&gt;Mean: &lt;br /&gt;&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;\mu = &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;for Gaussian,&lt;br /&gt;&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;&lt;br /&gt;\mu_{\tiny MLE} = \frac{\sum(x_i)}{N}&lt;br /&gt;\sigma_{\tiny MLE}^2 = \frac{\sum(x_i - \mu_{MLE})^2}{N}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Unbiased estimator:&lt;br /&gt;&lt;br /&gt;if expected value of the estimate == true value&lt;br /&gt;&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;&lt;br /&gt;E[\mu_{\tiny MLE}] = \frac{E[\sum(x_i])}{N}=\frac{\sum(E[x_i])}{N}=\mu.  YES&lt;br /&gt;E[\sigma_{\tiny MLE}] NO&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;MAP: we want to maximize p(\mu|X) = p(X|\mu)p(\mu)/p(X) = \arg max p(X|\mu)p(\mu)&lt;br /&gt;&lt;br /&gt;MAP with a uniform (uninformative) prior reduces to MLE estimator. i.e p(\mu|X) = \arg max p(x|\mu)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-9187256401654634809?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/9187256401654634809/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=9187256401654634809' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/9187256401654634809'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/9187256401654634809'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/mle-vs-map-vs-biased-estimator.html' title='MLE vs. MAP vs. biased estimator'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-7538165714899905970</id><published>2010-04-19T18:37:00.000-07:00</published><updated>2010-04-19T21:56:05.042-07:00</updated><title type='text'>Relevance Feedback</title><content type='html'>In a document retrieval system, when user issues an initial query, the list of documents system returns often consists of relevant and irrelevant documents. After user provides some &lt;span style="font-style:italic;"&gt;feedback&lt;/span&gt; on the retrieval relevance, explicitly or implicitly, system is able to return a better list of documents by formulating a new query, with most relevant ones ranked first and irrelevant on the bottom.&lt;br /&gt;&lt;br /&gt;The theory is try to maximize the similarity between query and relevant documents, while minimizing the similarity between query and irrelevant documents.&lt;br /&gt;&lt;br /&gt;How to generate a new query: 1. add new terms. 2. reweight query terms.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;&lt;br /&gt;q = \arg max_q [sim(q,C_r)-sim(q,C_nr)]&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Rocchio algorithm is an example of incorporating feedback to the modified query. The effect is to move query towards the centroid of relevant documents, and move away from the centroid irrelevant documents.&lt;br /&gt;&lt;br /&gt;Users are often reluctant to provide feedback to prolong the search interaction. Pseudo-feedback&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-7538165714899905970?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/7538165714899905970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=7538165714899905970' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/7538165714899905970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/7538165714899905970'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/relevance-feedback.html' title='Relevance Feedback'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-9146679152715525751</id><published>2010-04-10T18:52:00.000-07:00</published><updated>2010-04-10T18:55:09.865-07:00</updated><title type='text'>strange C problem</title><content type='html'>void foo() {&lt;br /&gt;        int tmp = 1;&lt;br /&gt;&lt;br /&gt;        int tmp;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;See if it compiles. &lt;br /&gt;&lt;br /&gt;For me,  no warning. no error. no nothing. but also no object file generated. doesn't gcc check symbol table for duplicate definition?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-9146679152715525751?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/9146679152715525751/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=9146679152715525751' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/9146679152715525751'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/9146679152715525751'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/c-program.html' title='strange C problem'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-3161570247888167258</id><published>2010-04-01T23:14:00.000-07:00</published><updated>2010-04-01T23:16:25.182-07:00</updated><title type='text'>Perimeter of a triangle inscribed in a circle</title><content type='html'>is not fixed. Consider the easy cases of a right triangle with each corner being 60 degrees and the one with one side going through the center of the circle&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-3161570247888167258?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/3161570247888167258/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=3161570247888167258' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/3161570247888167258'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/3161570247888167258'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/04/perimeter-of-triangle-inscribed-in.html' title='Perimeter of a triangle inscribed in a circle'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-2389321121047320975</id><published>2010-02-21T17:22:00.000-08:00</published><updated>2010-02-21T17:27:51.781-08:00</updated><title type='text'>Mahalanobis distance</title><content type='html'>Wikipedia: It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant, i.e. not dependent on the scale of measurements.&lt;br /&gt;&lt;br /&gt;For&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt; x= (x_1, x_2, x_3, ..., x_N)^T&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;, and covariance matrix S, &lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;&lt;br /&gt;D_m(x) = \sqrt{(x-\mu)^TS^{-1}(x-\mu)}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;d(x,y)=\sqrt{(x-y)^TS^{-1}(x-y)}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-2389321121047320975?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/2389321121047320975/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=2389321121047320975' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2389321121047320975'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2389321121047320975'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/02/mahalanobis-distance.html' title='Mahalanobis distance'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-127817095816171256</id><published>2010-02-21T17:18:00.000-08:00</published><updated>2010-02-22T00:47:44.118-08:00</updated><title type='text'>Some ways of inserting Latex code in your apps</title><content type='html'>test&lt;br /&gt;&lt;br /&gt;&lt;pre lang="eq.latex"&gt;&lt;br /&gt;\int_{0}^{1}\frac{x^{4}\left(1-x\right)^{4}}{1+x^{2}}dx&lt;br /&gt;=\frac{22}{7}-\pi&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-127817095816171256?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/127817095816171256/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=127817095816171256' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/127817095816171256'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/127817095816171256'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/02/latex-test.html' title='Some ways of inserting Latex code in your apps'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-1627362789842249890</id><published>2010-02-13T00:14:00.000-08:00</published><updated>2010-02-13T00:42:52.182-08:00</updated><title type='text'>Entropy</title><content type='html'>[Wikipedia]: Shannon's entropy represents an absolute limit on the best possible lossless compression&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;for a fair coin X: p(head) = p(tail) = 0.5.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;H(X) = -2*0.5*log(0.5) = 1.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;when it's not fair, let's say p(head) = 0.6, then&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;H(X) = -0.6*log(0.6)-0.4*log(0.4) = 0.97&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The more even the distribution is (uncertainty is high), the more bits needed for encoding.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;#include &lt;stdio.h&gt;&lt;/stdio.h&gt;&lt;/div&gt;&lt;div&gt;#include &lt;stdlib.h&gt;&lt;/stdlib.h&gt;&lt;/div&gt;&lt;div&gt;#include &lt;math.h&gt;&lt;/math.h&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;int main(int argc, char* argv[]) {&lt;/div&gt;&lt;div&gt;    if(argc &lt;&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;printf("entropy filename\n");&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;exit(0);&lt;/div&gt;&lt;div&gt;    }&lt;/div&gt;&lt;div&gt;    char* fn = argv[1];&lt;/div&gt;&lt;div&gt;    FILE* fp = fopen(fn,"r");&lt;/div&gt;&lt;div&gt;    char line[255];&lt;/div&gt;&lt;div&gt;    float c;&lt;/div&gt;&lt;div&gt;    float e = 0.0f;&lt;/div&gt;&lt;div&gt;    if(fp != NULL) {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;while(fscanf(fp,"%f", &amp;amp;c)!=EOF) {&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;    printf("%f\n",c);&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;    e += c*log(c)/log(2);&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;}&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;close(fp);&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt; &lt;/span&gt;printf("Entropy:%f\n", 0-e);&lt;/div&gt;&lt;div&gt;    }&lt;/div&gt;&lt;div&gt;    return 0;&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-1627362789842249890?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/1627362789842249890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=1627362789842249890' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/1627362789842249890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/1627362789842249890'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2010/02/entropy.html' title='Entropy'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-8562668263880233854</id><published>2009-11-15T23:29:00.000-08:00</published><updated>2009-11-15T23:38:33.068-08:00</updated><title type='text'>Freecell: probability of having no valid move at start</title><content type='html'>I love FreeCell crazily. When the game begins, you get  8 columns of cards. The only rule to move a card onto top of another one is that the numbers on two cards are continuous. It's possible that all of the numbers on 8 top cards are not continuous to each other so you have to move some to the free decks. How likely this is to happen?&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In practice not very likely. I have played a few hundred games and it didn't happen to me.&lt;br /&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;BTW, finding a solution to a FreeCell game is NP-complete (according to Wikipedia), but generally one can finish it within a few minutes. Again it shows NP-complete problems are not that hard to solve in practice.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-8562668263880233854?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/8562668263880233854/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=8562668263880233854' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/8562668263880233854'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/8562668263880233854'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2009/11/freecell-probability-of-having-no-valid.html' title='Freecell: probability of having no valid move at start'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-5723851334220075679</id><published>2009-06-16T23:45:00.000-07:00</published><updated>2009-08-12T00:36:30.950-07:00</updated><title type='text'>Why chomp() doesn't work under Windows?</title><content type='html'>No doubt that chomp() is frequently used in all kinds of Perl scripts. Today, I spent a few seconds on a Perl script file which is to eliminate all the illegal chars of a XML files, expecting to get the work done soon. It wasn't, however. The code looked extremely plain and straightforward, but it simply refused to work. Soon I realized that I was working under Windows - literally the first time I wrote a Perl script under Windows! The problem was EOL, the End-Of-Line mark. While LF('\n') is sufficient to make a new line under *nix, Windows needs CR+LR("\r\n") to separate paragraphs. Here is what I use now:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(51, 153, 153);"&gt;while&lt;/span&gt;(&lt;span style="color: rgb(153, 0, 0);"&gt;$line&lt;/span&gt; = &lt;&lt;span style="color: rgb(153, 0, 0);"&gt;FILE&lt;/span&gt;&gt;) {&lt;br /&gt;     &lt;span style="color: rgb(153, 0, 0);"&gt;    $line&lt;/span&gt; =~ s/&lt;span style="color: rgb(51, 102, 102);"&gt;\r\n&lt;/span&gt;$//;&lt;br /&gt;     &lt;span style="color: rgb(192, 192, 192);"&gt;    //do your work&lt;/span&gt;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Hope this helps.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-5723851334220075679?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/5723851334220075679/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=5723851334220075679' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/5723851334220075679'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/5723851334220075679'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2009/06/why-chomp-doesnt-work-under-windows.html' title='Why chomp() doesn&apos;t work under Windows?'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-5488469423354001785</id><published>2008-12-19T20:49:00.000-08:00</published><updated>2008-12-19T20:51:06.448-08:00</updated><title type='text'>Business</title><content type='html'>I have always been thinking of starting some online business, or online service. Actually, a website. A website of ...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-5488469423354001785?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/5488469423354001785/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=5488469423354001785' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/5488469423354001785'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/5488469423354001785'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2008/12/business.html' title='Business'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-2895114641626260807</id><published>2008-06-05T12:19:00.000-07:00</published><updated>2008-06-05T12:21:56.964-07:00</updated><title type='text'></title><content type='html'>Naive Bayes:&lt;br /&gt;&lt;br /&gt;Semi-supervised:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;use integers&lt;/li&gt;&lt;li&gt;when use unlabeled data, we can make the new label fraction&lt;/li&gt;&lt;li&gt;choose top-k&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-2895114641626260807?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/2895114641626260807/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=2895114641626260807' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2895114641626260807'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2895114641626260807'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2008/06/naive-bayes-semi-supervised-when-use.html' title=''/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-3376583873915290338</id><published>2008-06-05T10:22:00.001-07:00</published><updated>2008-06-05T10:22:10.922-07:00</updated><title type='text'>Topic Modeling</title><content type='html'>&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-3376583873915290338?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/3376583873915290338/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=3376583873915290338' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/3376583873915290338'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/3376583873915290338'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2008/06/topic-modeling.html' title='Topic Modeling'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-9085854595975561379</id><published>2008-06-05T10:05:00.000-07:00</published><updated>2008-06-05T10:06:07.341-07:00</updated><title type='text'>Questions</title><content type='html'>thick line: less expressive&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-9085854595975561379?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/9085854595975561379/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=9085854595975561379' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/9085854595975561379'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/9085854595975561379'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2008/06/questions.html' title='Questions'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-4059717794016761960</id><published>2008-06-05T10:01:00.000-07:00</published><updated>2008-06-05T10:22:02.210-07:00</updated><title type='text'>Probability</title><content type='html'>&lt;h1 class="firstHeading"&gt;&lt;/h1&gt;&lt;ul&gt;&lt;li&gt;Bayesian probability&lt;/li&gt;&lt;li&gt;Naive Bayes classifier&lt;/li&gt;&lt;li&gt;LDA&lt;/li&gt;&lt;li&gt;LSI&lt;/li&gt;&lt;li&gt;pLSI&lt;/li&gt;&lt;li&gt;matrix factorization&lt;/li&gt;&lt;li&gt;PLSA&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-4059717794016761960?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/4059717794016761960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=4059717794016761960' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/4059717794016761960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/4059717794016761960'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2008/06/probability.html' title='Probability'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1244231074627122549.post-2610388450002453346</id><published>2008-06-05T09:56:00.001-07:00</published><updated>2010-02-22T00:46:11.824-08:00</updated><title type='text'>Generative Model &amp; Discriminative Model</title><content type='html'>Generative Model&lt;br /&gt;  You know P(X|Y) -&gt; You can generate some data according to the likelihood. You classify by P(Y|X) \prop P(X|Y)P(Y)&lt;br /&gt;&lt;br /&gt;Discriminative Model&lt;br /&gt;  You only know P(Y|X) -&gt; You classify the data directly without knowing the data likelihood&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1244231074627122549-2610388450002453346?l=yandongliu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://yandongliu.blogspot.com/feeds/2610388450002453346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1244231074627122549&amp;postID=2610388450002453346' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2610388450002453346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1244231074627122549/posts/default/2610388450002453346'/><link rel='alternate' type='text/html' href='http://yandongliu.blogspot.com/2008/06/generative-model-discriminative-model.html' title='Generative Model &amp; Discriminative Model'/><author><name>Yandong Liu</name><uri>http://www.blogger.com/profile/16061261112583734294</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
