Before discussing this, it will only be fair to state here that the transformation mentioned earlier brings to apparently distant strings closer.
Two strings may be compared for an exact match.
Many match engines are based on such exact matches. Let us consider the following strings:
1. Distance between two exactly same strings is 0
2. Distance between two input strings is non-negative
3. Distance between two strings increases (at least non-decreasing) when the similarity between the strings decreases.
- d (s1, s1) = 0
- d (s1, s1) ≥ 0
- d (s1, s1) ≥ d (s1, s3) when s1 is more similar to s3 than s2
Let m be the number of position-wise matching characters in s1 and s2
While fixing a value for δ, it has to be kept in mind if the length of s1 is one more than the length of s2 and s2 is a sub-string of s1 then d (s1, s2) = 1-1/(1 + δ)
If we want two strings as above (lengths differ by one and the smaller one is a sub-string of the other) to differ by 5 unit then 1-1/(1 + δ) = 0.05 => δ = 0.05 (appx.). In fact, we see that in order to have such strings closer, we need to set δ as a very small positive number.
The distance function defined above lies between 0 and 1 and a match probability can be defined using this distance function. For example, the match probability can be p (s1, s2) = 1 - d (s1, s2)