<aside> 📌 By Dr. Nir Regev
</aside>
<aside> 📌 For more like-tutorials visit Circuit of Knowledge
</aside>
Information theory, pioneered by Claude Shannon in 1948, provides a mathematical framework for quantifying, storing, and communicating information. This tutorial will cover key concepts including Shannon entropy, mutual information, and information gain, which form the basis for understanding more advanced concepts like KLD and cross-entropy.
Shannon entropy quantifies the average amount of information contained in a message. For a discrete random variable X with possible values {x₁, x₂, ..., xₙ} and probability mass function P(X), the Shannon entropy H(X) is defined as:
$$ H(X) = -∑P(xᵢ) log₂ P(xᵢ) $$
Where:
import numpy as np
def shannon_entropy(p):
"""Compute Shannon entropy of a discrete probability distribution."""
# Remove zero probabilities
p = p[p > 0]
return -np.sum(p * np.log2(p))
# Example
p = np.array([0.5, 0.25, 0.25])
print(f"Shannon entropy: {shannon_entropy(p):.4f} bits")
Shannon entropy: 1.5000 bits
For two discrete random variables X and Y, the joint entropy H(X,Y) is defined as:
$$ H(X,Y) = -∑∑P(x,y) log₂ P(x,y) $$
Where P(x,y) is the joint probability distribution of X and Y.
import numpy as np
def joint_entropy(p_xy):
"""Compute joint entropy of two discrete random variables."""
# Remove zero probabilities
p_xy = p_xy[p_xy > 0]
return -np.sum(p_xy * np.log2(p_xy))
# Example
p_xy = np.array([[0.2, 0.1], [0.3, 0.4]])
print(f"Joint entropy: {joint_entropy(p_xy):.4f} bits")
Joint entropy: 1.8464 bits