export type Frequencies = { [key: string]: number };
export type Codebook = { [key: string]: string };
export interface PartitionState {
chars: string[];
freqs: number[];
totalFreq: number;
depth: number;
prefix: string;
parent: number | null;
children: (number | null)[];
splitIndex: number | null;
leftSum: number | null;
rightSum: number | null;
}
export interface TreeNode {
id: string;
name: string;
value: number;
code?: string;
splitIndex?: number | null;
leftSum?: number | null;
rightSum?: number | null;
children: TreeNode[];
}
export interface Statistics {
originalSize: number;
compressedSize: number;
compressionRatio: number;
averageCodeLength: number;
}
The algorithm progresses through several well-defined steps:
The calculateFrequencies
function computes the frequency of each character in the input string.
export const calculateFrequencies = (text: string): Frequencies => {
const freqs: Frequencies = {};
for (let i = 0; i < text.length; i++) {
const char = text[i];
freqs[char] = (freqs[char] || 0) + 1;
}
return freqs;
};
The sortByFrequency
function sorts characters by their frequencies in descending order.
export const sortByFrequency = (freqs: Frequencies): string[] => {
return Object.keys(freqs).sort((a, b) => freqs[b] - freqs[a]);
};
The generatePartitionHistory
function recursively partitions characters into groups with similar total frequencies. It also maintains a history of partition states.
export const generatePartitionHistory = (
chars: string[],
freqs: Frequencies
): PartitionHistory => {
const history: PartitionHistory = [];
// Initialization and recursive partition logic
...
partition(0); // Start partitioning from the root
return history;
};
The partitioning process is recursive and tracks its progress through a PartitionHistory
array. Each entry in this array represents the state of a partition at a specific point in the algorithm.
The root state is created as the starting point of the partitioning process. It includes:
chars
): All characters sorted by frequency.
freqs
): Corresponding frequencies of the characters.
totalFreq
): Sum of all character frequencies.
null
(root has no parent).
splitIndex
, leftSum
, rightSum
) are initialized to null
.Example of the root state:
history.push({
chars: [...chars],
freqs: chars.map((char) => freqs[char]),
totalFreq: chars.reduce((sum, char) => sum + freqs[char], 0),
depth: 0,
prefix: "",
parent: null,
children: [],
splitIndex: null,
leftSum: null,
rightSum: null,
});
The partition
function is called recursively to split the characters into left and right groups. The steps are as follows:
If the partition contains only one character, it cannot be split further. The function returns:
if (state.chars.length <= 1) {
return;
}
The function iterates through the characters to find the best split point where the difference between the sums of the left and right partitions is minimized.
Key variables:
bestSplitIndex
: Index of the character where the split occurs.
minDifference
: Minimum difference between the sums of the left and right partitions.
leftSum
: Running sum of the frequencies of characters on the left.
totalSum
: Total frequency of all characters in the partition.
Logic:
leftSum
: Sum of frequencies on the left of the split.
rightSum
: Sum of frequencies on the right of the split.
difference
: Absolute difference between leftSum
and rightSum
.
bestSplitIndex
if difference
is smaller than minDifference
.
let bestSplitIndex = 0;
let minDifference = Number.POSITIVE_INFINITY;
let leftSum = 0;
const totalSum = state.chars.reduce(
(sum: number, char: string) => sum + freqs[char],
0
);
for (let i = 0; i < state.chars.length - 1; i++) {
leftSum += freqs[state.chars[i]];
const rightSum = totalSum - leftSum;
const difference = Math.abs(leftSum - rightSum);
if (difference < minDifference) {
minDifference = difference;
bestSplitIndex = i;
}
}
Once the optimal split point is found:
splitIndex
is set to bestSplitIndex
.
leftSum
and rightSum
are calculated for the left and right partitions.
state.splitIndex = bestSplitIndex;
state.leftSum = state.chars
.slice(0, bestSplitIndex + 1)
.reduce((sum: number, char: string) => sum + freqs[char], 0);
state.rightSum = state.chars
.slice(bestSplitIndex + 1)
.reduce((sum: number, char: string) => sum + freqs[char], 0);
The partition is split into two groups:
New partition states are created for both groups:
prefix
for the left group is the parent's prefix + "0"
.
prefix
for the right group is the parent's prefix + "1"
.
const leftState = {
chars: leftChars,
freqs: leftChars.map((char) => freqs[char]),
totalFreq: leftChars.reduce((sum, char) => sum + freqs[char], 0),
depth: state.depth + 1,
prefix: state.prefix + "0",
parent: stateIndex,
children: [],
splitIndex: null,
leftSum: null,
rightSum: null,
};
const rightState = {
chars: rightChars,
freqs: rightChars.map((char) => freqs[char]),
totalFreq: rightChars.reduce((sum, char) => sum + freqs[char], 0),
depth: state.depth + 1,
prefix: state.prefix + "1",
parent: stateIndex,
children: [],
splitIndex: null,
leftSum: null,
rightSum: null,
};
history.push(leftState);
history.push(rightState);
The function recursively calls itself for the left and right partitions:
if (leftChars.length > 1) {
partition(leftIndex);
}
if (rightChars.length > 1) {
partition(rightIndex);
}
After all recursive calls are complete, the PartitionHistory
array contains the entire history of partitions, which includes:
This history is used for generating the binary codebook and visualizing the Shannon-Fano tree.
Example Output of Partitioning
For an input string "aabbc":
{a: 2, b: 2, c: 1}
[a, b, c]
[a, b, c]
→ Split into [a, b]
and [c]
[a, b]
: Split into [a]
and [b]
[c]
: Single character, no further split.
Resulting Partition History:
[
{ chars: ['a', 'b', 'c'], prefix: '', splitIndex: 1 },
{ chars: ['a', 'b'], prefix: '0', splitIndex: 0 },
{ chars: ['c'], prefix: '1' },
{ chars: ['a'], prefix: '00' },
{ chars: ['b'], prefix: '01' }
]
The buildCodebook
function assigns binary codes to characters based on the partition history.
export const buildCodebook = (history: PartitionHistory): Codebook => {
const codes: Codebook = {};
history.forEach((state) => {
if (state.chars.length === 1 && state.prefix) {
codes[state.chars[0]] = state.prefix;
}
});
return codes;
};
The encodeString
function converts a string to binary representation using the codebook.
export const encodeString = (text: string, codes: Codebook): string => {
let encoded = "";
for (let i = 0; i < text.length; i++) {
encoded += codes[text[i]];
}
return encoded;
};
The calculateStatistics
function computes metrics like the compression ratio.
export const calculateStatistics = (
text: string,
codes: Codebook
): Statistics => {
const origSize = text.length * 8; // 8 bits per character
const compSize = text
.split("")
.reduce((sum, char) => sum + (codes[char] ? codes[char].length : 0), 0);
const ratio = origSize > 0 ? (1 - compSize / origSize) * 100 : 0;
const avgLength = text.length > 0 ? compSize / text.length : 0;
return {
originalSize: origSize,
compressedSize: compSize,
compressionRatio: ratio,
averageCodeLength: avgLength,
};
};
The buildTreeData
function constructs a tree structure from the partition history for visualization purposes.
export const buildTreeData = (history: PartitionHistory): TreeNode | null => {
if (history.length === 0) return null;
const buildNode = (historyIndex: number): TreeNode => {
const state = history[historyIndex];
if (state.chars.length === 1) {
return {
id: `node-${historyIndex}`,
name: state.chars[0],
value: state.freqs[0],
code: state.prefix,
children: [],
};
}
...
return {
id: `node-${historyIndex}`,
name: state.chars.join(", "),
value: state.totalFreq,
children,
};
};
return buildNode(0); // Build from the root
};
The Shannon-Fano implementation in this repository is a robust and modular representation of the algorithm, complete with visualization and statistical analysis capabilities. While not optimized for practical compression tasks, it serves as an excellent educational tool to demonstrate the principles of entropy coding.
For more details, you can view the code directly here.