Adaptable Research

Bijective Variable Length Quantities

A Variable Length Quantity is a codec between arbitrary precision non-negative integers, that is 0, 1, 2, 3, and so on until you run out of space, and byte strings.

Description

The basic form of a VLQ is that the final byte has the most significant bit set to 0, terminating the byte string, preceding bytes have it set to 1. Then the remaining 7 bits in each byte hold a base-128 digit, with the first byte holding the most significant digit.

This variant provides a bijective codec by biasing the quantity such that, if all the digits of an n+1 digit sequence are 0 the quantity is 1 more than the maximum that can be encoded with n digits.

For example, the byte string 92 30 holds the digits 18 48 (base 128), which is 2352 (base 10), the maximum value representable by a single digit is 127, so that gets a bias of 128, for a decoded quantity of 2480.

This biasing can be efficiently implemented during decoding by adding 1 before incorporating subsequent digits, and during encoding by subtracting 1 after extracting each digit.

Properties

Compact

Compactness is, of course, dependent on the distribution of values to be encoded.

The ideal distribution for this VLQ is:

Fast

Decoding operates on bytes (and the quantity type), uses addition, bit shifts, and bit masks, reads data sequentially. Encoding is similar, but uses subtraction instead, and, unfortunately, produces its output in reverse order.

Bijective

This codec is a bijection between quantities and a variable number of bytes. Notably, each byte string ending in a byte with the most significant bit cleared represents a distinct array of quantities.

Order Preserving

It preserves the order, and as the bijection is complete, there’s no validation required before sorting.

Self-Terminating

This codec is strongly self-terminating, the final byte can be found from any point in the byte string.

Self-Synchronizing

However, it is only weakly self-synchronizing. The codec requires external information to know that the current byte is the beginning of a value.

Usage and Implementations

Applications

Other Encodings

Libraries

published