P(Y=1|X) = 1 / 1+exp(w_0+\sum(w_i X_i))
P(Y=0|X) = exp(w_0+\sum(w_i X_i)) / 1+exp(w_0+\sum(w_i X_i))
log(P(Y=0|X)/P(Y=1|X)) = w_0 +\sum(w_i X_i). So LR has a linear decision boundary.
It's easy to show that LR and GNB
LR doesn't require independence assumption, while Naive Bayesian does. When the data assumption doesn't hold, LR often outperforms GNB.
When feature space's assumption holds, LR and NB learn an identical classifier when # training examples approaches infinity.
GNB converges towards it asymptotic accuracy faster than LR:
When dataset is small, GNB > LR
When dataset is big, LR > GNR
LR estimates P(Y|X) directly, hence discriminative. NB estimates P(X|Y) and P(Y), hence generative.
We use MLE to estimate parameters of GNB, but why use gradient to LR? Because we don't want to assume GNB's feature independence.
Thursday, April 22, 2010
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment