Saturday, February 13, 2010

Entropy

[Wikipedia]: Shannon's entropy represents an absolute limit on the best possible lossless compression

for a fair coin X: p(head) = p(tail) = 0.5.

H(X) = -2*0.5*log(0.5) = 1.

when it's not fair, let's say p(head) = 0.6, then

H(X) = -0.6*log(0.6)-0.4*log(0.4) = 0.97

The more even the distribution is (uncertainty is high), the more bits needed for encoding.

#include
#include
#include

int main(int argc, char* argv[]) {
if(argc <>
printf("entropy filename\n");
exit(0);
}
char* fn = argv[1];
FILE* fp = fopen(fn,"r");
char line[255];
float c;
float e = 0.0f;
if(fp != NULL) {
while(fscanf(fp,"%f", &c)!=EOF) {
printf("%f\n",c);
e += c*log(c)/log(2);
}
close(fp);
printf("Entropy:%f\n", 0-e);
}
return 0;
}

0 comments: