Module | Ai4r::Data::Proximity |
In: |
lib/ai4r/data/proximity.rb
|
This module provides classical distance functions
Euclidean distance, or L2 norm. Parameters a and b are vectors with continuous attributes. Euclidean distance tends to form hyperspherical clusters(Clustering, Xu and Wunsch, 2009). Translations and rotations do not cause a distortion in distance relation (Duda et al, 2001) If attributes are measured with different units, attributes with larger values and variance will dominate the metric.
The Hamming distance between two attributes vectors of equal length is the number of attributes for which the corresponding vectors are different This distance function is frequently used with binary attributes, though it can be used with other discrete attributes.
city block, Manhattan distance, or L1 norm. Parameters a and b are vectors with continuous attributes.
The "Simple matching" distance between two attribute sets is given by the number of values present on both vectors. If sets a and b have lengths da and db then:
S = 2/(da + db) * Number of values present on both sets D = 1.0/S - 1
Some considerations:
This is a faster computational replacement for eclidean distance. Parameters a and b are vectors with continuous attributes.