0%

Hamming distance is a metric to measure difference between two strings of equal length or two data with same length(eg. two binary data with same bits).

# Definiton

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.

# Example

## String

the Hamming distance between below two strings is 8.
“Hello,world”
“world,hello”
11101011101 | 8
there is a python distance lib which can calculate hamming distance.

## Data

hamming distance between
`1011101` and
`1001001`
is 2.

Here data can be binary or octonary or decimal or hexadecimal, but the two data should be with same length in same system.

# Application

## Coding theory

In coding theory, hamming distance is used as error detecting and error correcting codes(ECC).
a code with minimum Hamming distance d between its codewords can detect at most d-1 errors and can correct (d-1)/2 errors.The latter number is also called the error-correcting capability of the code.

For error detecting, hammind distance is usde to evaluate the performace of algorithm.
Let’s say, there are fixed-length data d, then we count a checksum C using differet algorithm. then we have data(d, C). we can get the minimum hamming distance d in set(d, C). so the algorithm can detect d-1 errors and correct (d-1)/2 errors.
CRC is generally better than modular sum.

For error correcting, there is a kind of ECC named as Hamming Code.

## Programming

In softwar programming, there is a simple method to protect data from unexpected bit inverse(eg. soft error).
By define enum type like this:

the minimum hamming distance between `Red`,`Blue` and `Green` is 4.

# Generate data set with min hamming distance

`data=` defines the first number of data set; `min_hamming_distance` defines the min hanmming distance of the data set.

Reference：