计算KNN的欧氏距离

时间:2017-09-05 17:01:53

标签: machine-learning sentiment-analysis knn nearest-neighbor euclidean-distance

我已经看到很多计算KNN的欧氏距离的例子,但是没有用于情感分类。

例如,我有句“非常接近的游戏”

如何计算句子“一场伟大的游戏”的欧几里德距离?

1 个答案:

答案 0 :(得分:1)

考虑一个关于多维空间中某个点的句子,只有在定义了坐标系后才能计算出欧几里德距离。例如。你可以介绍

  1. O1 - 句子长度(长度)
  2. O2 - 单词编号(WordsCount)
  3. O2 - 按字母顺序排列的中心(我只是想到了它)。它可以计算为句子中每个作品的字母中心的算术平均值。

    CharsIndex = Sum(Char.indexInWord) / CharsCountInWord; CharsCode = Sum(Char.charCode) / CharsCount; AlphWordCoordinate = [CharsIndex, CharsCode]; WordsIndex = Sum(Words.CharsIndex) / WordsCount; WordsCode = Sum(Words.CharsCode) / WordsCount; AlphaSentenceCoordinate = (WordsIndex ^2+WordsCode^2+WordIndexInSentence^2)^1/2;

  4. 因此,欧几里德距离的计算结果如下:

    EuclidianSentenceDistance = (WordsCount^2 + Length^2 + AlphaSentenceCoordinate^2)^1/2
    

    并非每个句子都可以转换为指向三维空间,例如 P [Length,Words,AlphaCoordinate] 。有一段距离你可以比较和分类句子。

    我猜这不是理想的方法,但我想向你展示一个想法。

    import math
    
    def calc_word_alpha_center(word):
        chars_index = 0;
        chars_codes = 0;
        for index, char in enumerate(word):
            chars_index += index
            chars_codes += ord(char)
        chars_count = len(word)
        index = chars_index / len(word)
        code = chars_codes / len(word)
        return (index, code)
    
    
    def calc_alpha_distance(words):
        word_chars_index = 0;
        word_code = 0;
        word_index = 0;
        for index, word in enumerate(words):
            point = calc_word_alpha_center(word)
            word_chars_index += point[0]
            word_code += point[1]
            word_index += index
        chars_index = word_chars_index / len(words)
        code = word_code / len(words)
        index = word_index / len(words)
        return math.sqrt(math.pow(chars_index, 2) + math.pow(code, 2) + math.pow(index, 2))
    
    def calc_sentence_euclidean_distance(sentence):
        length = len(sentence)
    
        words = sentence.split(" ")
        words_count = len(words)
    
        alpha_distance = calc_alpha_distance(words)
    
        return math.sqrt(math.pow(length, 2) + math.pow(words_count, 2) + math.pow(alpha_distance, 2))
    
    
    sentence1 = "a great game"
    sentence2 = "A great game"
    
    distance1 = calc_sentence_euclidean_distance(sentence1)
    distance2 = calc_sentence_euclidean_distance(sentence2)
    
    print(sentence1)
    print(str(distance1))
    
    print(sentence2)
    print(str(distance2))
    

    控制台输出

    a great game
    101.764433866
    A great game
    91.8477000256