Grinstead 4.2: Entropy of a Biased Coin
Problem Statement
Section titled “Problem Statement”A biased coin has probability of heads and of tails.
(a) Find the entropy as a function of . (b) For what value of is the entropy maximized? (c) Sketch the graph of for .
- Two outcomes: Heads (probability ), Tails (probability )
- — the entropy as a function of bias
- — the value that maximizes entropy
- Shape of the entropy curve
Solution
Section titled “Solution”Approach
Section titled “Approach”Apply the Shannon entropy formula directly, then use calculus to find the maximum.
Part (a): Find
Section titled “Part (a): Find H(p)H(p)H(p)”By definition of Shannon entropy:
This is called the binary entropy function, often denoted or .
Part (b): Maximize Entropy
Section titled “Part (b): Maximize Entropy”Take the derivative:
Setting :
Verification: The second derivative is:
Since the second derivative is negative everywhere, is indeed a maximum.
Part (c): Sketch
Section titled “Part (c): Sketch”Key points:
- (certain tails)
- bit (maximum uncertainty)
- (certain heads)
- Symmetric about
- Concave (bowl-shaped, opening down)
H(p) 1 | ___ | / \ | / \ | / \ 0 |/_________________\___ 0 0.5 1 pFinal Answer
Section titled “Final Answer”[!success] Answer (a)
(b) Maximum at , where bit
(c) Symmetric, concave curve with max at center, zeros at endpoints
Discussion
Section titled “Discussion”The binary entropy function is fundamental—it appears everywhere:
- Channel capacity of binary symmetric channel:
- Bounds on error-correcting codes
- Rate-distortion theory for binary sources
The symmetry reflects that a coin biased toward heads has the same entropy as one equally biased toward tails. What matters is the degree of bias, not its direction.
Variations
Section titled “Variations”- What if we use natural log? Same shape, but maximum is nats.
- What about a three-sided die? See [[Ternary Entropy]].
Related Problems
Section titled “Related Problems”- [[Conditional Entropy of Binary Channel]]
- [[Mutual Information of BSC]]
Mistakes I Made
Section titled “Mistakes I Made”First attempt: forgot to apply chain rule when differentiating . The base-2 log introduces a factor of that’s easy to miss.