Shannon Entropy
Definition
Section titled “Definition”Shannon entropy is a measure of the average uncertainty (or “surprise”) associated with a random variable. For a discrete random variable with possible outcomes and probability mass function , the entropy is defined as:
By convention, (justified by continuity).
Intuition
Section titled “Intuition”Entropy measures how surprised you expect to be when you learn the outcome of a random variable.
- If you flip a fair coin, each outcome is equally likely—maximum surprise, maximum entropy.
- If you flip a biased coin that lands heads 99% of the time, you’re rarely surprised—low entropy.
The key insight: entropy is the answer to “how many yes/no questions do I need, on average, to identify the outcome?”
Mathematical Formulation
Section titled “Mathematical Formulation”Bits as Units
Section titled “Bits as Units”When using , entropy is measured in bits. One bit is the entropy of a fair coin flip:
Alternative Bases
Section titled “Alternative Bases”- (natural log): entropy in nats
- : entropy in hartleys (rarely used)
Conversion:
Key Properties
Section titled “Key Properties”-
Non-negativity: , with equality iff is deterministic.
-
Maximum entropy: For outcomes, , with equality iff is uniform.
-
Additivity for independent variables: when .
-
Concavity:
-
Chain rule:
Examples
Section titled “Examples”Example 1: Fair Die
Section titled “Example 1: Fair Die”A fair six-sided die has:
You need about 2.6 yes/no questions on average to identify which face came up.
Example 2: English Letters
Section titled “Example 2: English Letters”If all 26 letters were equally likely:
But English has non-uniform letter frequencies. Shannon estimated:
This gap ( bits) is redundancy—it’s why compression works.
Connections
Section titled “Connections”- Relates to: [[Boltzmann Entropy]], [[Kullback-Leibler Divergence]], [[Mutual Information]]
- Required for: [[Rate-Distortion Theory]], [[Channel Capacity]], [[Source Coding Theorem]]
- Generalizes: [[Differential Entropy]] (continuous case)
Sources
Section titled “Sources”- Shannon, C. (1948). “A Mathematical Theory of Communication”
- Cover & Thomas, Elements of Information Theory, Chapter 2
- MacKay, Information Theory, Inference, and Learning Algorithms, Chapter 2
Open Questions
Section titled “Open Questions”- How does the choice of logarithm base affect information-theoretic arguments in the meme framework?
- What’s the natural “base” for measuring memetic entropy—bits (binary), or something else?