Multimedia Report

1. Data Structures

Frequencies: A mapping of characters to their frequencies.

export type Frequencies = { [key: string]: number };

Codebook: A mapping of characters to their binary codes.

export type Codebook = { [key: string]: string };

PartitionState: Represents the state of each partition during the algorithm's execution.

export interface PartitionState {

  chars: string[];

  freqs: number[];

  totalFreq: number;

  depth: number;

  prefix: string;

  parent: number | null;

  children: (number | null)[];

  splitIndex: number | null;

  leftSum: number | null;

  rightSum: number | null;

}

TreeNode: Represents a node in the Shannon-Fano tree for visualization purposes.

export interface TreeNode {

  id: string;

  name: string;

  value: number;

  code?: string;

  splitIndex?: number | null;

  leftSum?: number | null;

  rightSum?: number | null;

  children: TreeNode[];

}

Statistics: Stores metrics such as compression ratio and average code length.

export interface Statistics {

  originalSize: number;

  compressedSize: number;

  compressionRatio: number;

  averageCodeLength: number;

}

2. Algorithm Steps

The algorithm progresses through several well-defined steps:

Step 1: Frequency Analysis

The calculateFrequencies function computes the frequency of each character in the input string.

export const calculateFrequencies = (text: string): Frequencies => {

  const freqs: Frequencies = {};

  for (let i = 0; i < text.length; i++) {

    const char = text[i];

    freqs[char] = (freqs[char] || 0) + 1;

  }

  return freqs;

};

Step 2: Sorting

The sortByFrequency function sorts characters by their frequencies in descending order.

export const sortByFrequency = (freqs: Frequencies): string[] => {

  return Object.keys(freqs).sort((a, b) => freqs[b] - freqs[a]);

};

Step 3: Partitioning

The generatePartitionHistory function recursively partitions characters into groups with similar total frequencies. It also maintains a history of partition states.

export const generatePartitionHistory = (

  chars: string[],

  freqs: Frequencies

): PartitionHistory => {

  const history: PartitionHistory = [];

  // Initialization and recursive partition logic

  ...

  partition(0); // Start partitioning from the root

  return history;

};

The partitioning process is recursive and tracks its progress through a PartitionHistory array. Each entry in this array represents the state of a partition at a specific point in the algorithm.

3.1. Initialization

The root state is created as the starting point of the partitioning process. It includes:

Characters (chars): All characters sorted by frequency.
Frequencies (freqs): Corresponding frequencies of the characters.
Total Frequency (totalFreq): Sum of all character frequencies.
Depth: The depth of the partition in the tree (starts from 0).
Prefix: Empty string for the root (binary code for this partition).
Parent: null (root has no parent).
Children: Empty array (to be filled when split into left and right partitions).
Split Information: All split-related fields (splitIndex, leftSum, rightSum) are initialized to null.

Example of the root state:

history.push({

  chars: [...chars],

  freqs: chars.map((char) => freqs[char]),

  totalFreq: chars.reduce((sum, char) => sum + freqs[char], 0),

  depth: 0,

  prefix: "",

  parent: null,

  children: [],

  splitIndex: null,

  leftSum: null,

  rightSum: null,

});

3.2. Recursive Partitioning

The partition function is called recursively to split the characters into left and right groups. The steps are as follows:

3.2.1 Base Case

If the partition contains only one character, it cannot be split further. The function returns:

if (state.chars.length <= 1) {

  return;

}

3.2.2 Finding the Optimal Split Point

The function iterates through the characters to find the best split point where the difference between the sums of the left and right partitions is minimized.

Key variables:

bestSplitIndex: Index of the character where the split occurs.
minDifference: Minimum difference between the sums of the left and right partitions.
leftSum: Running sum of the frequencies of characters on the left.
totalSum: Total frequency of all characters in the partition.

Logic:

Iterate over the characters.
For each character, calculate:

leftSum: Sum of frequencies on the left of the split.
rightSum: Sum of frequencies on the right of the split.
difference: Absolute difference between leftSum and rightSum.

Update bestSplitIndex if difference is smaller than minDifference.

let bestSplitIndex = 0;

let minDifference = Number.POSITIVE_INFINITY;

let leftSum = 0;

const totalSum = state.chars.reduce(

  (sum: number, char: string) => sum + freqs[char],

  0

);



for (let i = 0; i < state.chars.length - 1; i++) {

  leftSum += freqs[state.chars[i]];

  const rightSum = totalSum - leftSum;

  const difference = Math.abs(leftSum - rightSum);



  if (difference < minDifference) {

    minDifference = difference;

    bestSplitIndex = i;

  }

}

3.2.3 Updating Partition State

Once the optimal split point is found:

splitIndex is set to bestSplitIndex.
leftSum and rightSum are calculated for the left and right partitions.

state.splitIndex = bestSplitIndex;

state.leftSum = state.chars

  .slice(0, bestSplitIndex + 1)

  .reduce((sum: number, char: string) => sum + freqs[char], 0);

state.rightSum = state.chars

  .slice(bestSplitIndex + 1)

  .reduce((sum: number, char: string) => sum + freqs[char], 0);

3.3. Creating Left and Right Partitions

The partition is split into two groups:

Left Group: Characters from the start to the split index.
Right Group: Characters from the split index + 1 to the end.

New partition states are created for both groups:

The prefix for the left group is the parent's prefix + "0".
The prefix for the right group is the parent's prefix + "1".

const leftState = {

  chars: leftChars,

  freqs: leftChars.map((char) => freqs[char]),

  totalFreq: leftChars.reduce((sum, char) => sum + freqs[char], 0),

  depth: state.depth + 1,

  prefix: state.prefix + "0",

  parent: stateIndex,

  children: [],

  splitIndex: null,

  leftSum: null,

  rightSum: null,

};



const rightState = {

  chars: rightChars,

  freqs: rightChars.map((char) => freqs[char]),

  totalFreq: rightChars.reduce((sum, char) => sum + freqs[char], 0),

  depth: state.depth + 1,

  prefix: state.prefix + "1",

  parent: stateIndex,

  children: [],

  splitIndex: null,

  leftSum: null,

  rightSum: null,

};



history.push(leftState);

history.push(rightState);

3.4. Recursive Calls

The function recursively calls itself for the left and right partitions:

if (leftChars.length > 1) {

  partition(leftIndex);

}

if (rightChars.length > 1) {

  partition(rightIndex);

}

3.5. Final Output: Partition History

After all recursive calls are complete, the PartitionHistory array contains the entire history of partitions, which includes:

Characters and frequencies in each partition.
Binary prefixes assigned to each group.
Parent-child relationships between partitions.

This history is used for generating the binary codebook and visualizing the Shannon-Fano tree.

Example Output of Partitioning

For an input string "aabbc":

Frequencies: {a: 2, b: 2, c: 1}
Sorted Characters: [a, b, c]
Partition Process:

Root: [a, b, c] → Split into [a, b] and [c]
Left Partition [a, b]: Split into [a] and [b]
Right Partition [c]: Single character, no further split.

Resulting Partition History:

[

  { chars: ['a', 'b', 'c'], prefix: '', splitIndex: 1 },

  { chars: ['a', 'b'], prefix: '0', splitIndex: 0 },

  { chars: ['c'], prefix: '1' },

  { chars: ['a'], prefix: '00' },

  { chars: ['b'], prefix: '01' }

]

Step 4: Code Assignment

The buildCodebook function assigns binary codes to characters based on the partition history.

export const buildCodebook = (history: PartitionHistory): Codebook => {

  const codes: Codebook = {};

  history.forEach((state) => {

    if (state.chars.length === 1 && state.prefix) {

      codes[state.chars[0]] = state.prefix;

    }

  });

  return codes;

};

Step 5: Encoding

The encodeString function converts a string to binary representation using the codebook.

export const encodeString = (text: string, codes: Codebook): string => {

  let encoded = "";

  for (let i = 0; i < text.length; i++) {

    encoded += codes[text[i]];

  }

  return encoded;

};

Step 6: Statistics Calculation

The calculateStatistics function computes metrics like the compression ratio.

export const calculateStatistics = (

  text: string,

  codes: Codebook

): Statistics => {

  const origSize = text.length * 8; // 8 bits per character

  const compSize = text

    .split("")

    .reduce((sum, char) => sum + (codes[char] ? codes[char].length : 0), 0);

  const ratio = origSize > 0 ? (1 - compSize / origSize) * 100 : 0;

  const avgLength = text.length > 0 ? compSize / text.length : 0;



  return {

    originalSize: origSize,

    compressedSize: compSize,

    compressionRatio: ratio,

    averageCodeLength: avgLength,

  };

};

3. Tree Visualization

The buildTreeData function constructs a tree structure from the partition history for visualization purposes.

export const buildTreeData = (history: PartitionHistory): TreeNode | null => {

  if (history.length === 0) return null;



  const buildNode = (historyIndex: number): TreeNode => {

    const state = history[historyIndex];

    if (state.chars.length === 1) {

      return {

        id: `node-${historyIndex}`,

        name: state.chars[0],

        value: state.freqs[0],

        code: state.prefix,

        children: [],

      };

    }

    ...

    return {

      id: `node-${historyIndex}`,

      name: state.chars.join(", "),

      value: state.totalFreq,

      children,

    };

  };



  return buildNode(0); // Build from the root

};

The Shannon-Fano implementation in this repository is a robust and modular representation of the algorithm, complete with visualization and statistical analysis capabilities. While not optimized for practical compression tasks, it serves as an excellent educational tool to demonstrate the principles of entropy coding.

For more details, you can view the code directly here.