ranish.com » Blocksort for BWT compression

Blocksort for BWT compression algorithm. Runs in O(n) using 8n bytes.
It is based on the Larsson and Sadakane Faster Suffix Sorting algorithm.
Ternary-split quicksort is replaced by the linear-time linked list group sorting.

Similar to Larsson and Sadakane it starts with radix sort building Suffix Array.
Then increasing suffix size by power of 2 on each pass it maintains three linked lists:
List of the groups of unsorted elements, list of the sorted elements that are suffixes of
unsorted elements, and list of the sorted elements that are suffixes of the sorted elements.
Once elements make it to the third list they will be skipped in groups on all subsequent passes.

Total time is O(n+Sum m*log(m)) where m is match lenght for every pair of the matching strings.
Since m is limited by the data content and not by the block size n - the algorithm is linear in time
in respect to the block size n. Still, the worst case is n*log(n) for the file of a repeated character.

In addition to the blocksort the file contains a variation of
Distance Coding and reverse algorithms for both DC and BWT.

Download blocksort.zip