Deriving the Maximum Entropy Distribution Under Mean Constraint
Given only the mean of a non-negative continuous random variable, find the probability density that maximizes entropy. This is the principle of maximum entropy (MaxEnt)—assume nothing beyond what you know.
Starting Point
Section titled “Starting Point”We want to maximize the differential entropy:
Subject to constraints:
- Normalization:
- Mean constraint:
Prerequisites
Section titled “Prerequisites”- [[Lagrange Multipliers]]
- [[Shannon Entropy]]
- [[Calculus of Variations]] (basic)
Derivation
Section titled “Derivation”Step 1: Set Up the Lagrangian
Section titled “Step 1: Set Up the Lagrangian”We form the functional:
where and are Lagrange multipliers.
Step 2: Take the Functional Derivative
Section titled “Step 2: Take the Functional Derivative”For the optimal , the first variation must vanish:
Computing term by term:
Setting the sum to zero:
Step 3: Solve for
Section titled “Step 3: Solve for p(x)p(x)p(x)”Rearranging:
Let , so:
This is the exponential distribution!
Step 4: Determine the Constants
Section titled “Step 4: Determine the Constants”From normalization:
From the mean constraint:
Result
Section titled “Result”[!success] Final Result The maximum entropy distribution for a non-negative random variable with known mean is the exponential distribution:
Interpretation
Section titled “Interpretation”This result says: if all you know about a non-negative quantity is its average, you should model it as exponentially distributed.
Why? Because the exponential distribution makes the fewest assumptions beyond what you’ve measured. Any other distribution would imply additional structure you don’t actually know.
This is Jaynes’ key insight: entropy maximization is principled ignorance.
Special Cases
Section titled “Special Cases”-
When : Standard exponential,
-
In the limit : The distribution concentrates at (deterministic)
-
In the limit : The distribution spreads out, approaching uniform (but improper)
Common Mistakes
Section titled “Common Mistakes”[!warning] Watch Out Don’t confuse this with maximizing entropy over all distributions on —that problem is ill-posed (no maximum exists without constraints). The mean constraint is essential.
Verification
Section titled “Verification”Dimensional check: has units of , so is dimensionless. ✓
Limiting case: As , entropy . This makes sense—more spread means more uncertainty. ✓
Alternative derivation: This can also be done via the partition function approach from statistical mechanics, giving the same answer.
Sources
Section titled “Sources”- Jaynes, E.T. (1957). “Information Theory and Statistical Mechanics”
- Cover & Thomas, Elements of Information Theory, Chapter 12