Question

我已经看到很多计算KNN的欧氏距离的例子，但是没有用于情感分类。

例如，我有句“非常接近的游戏”

如何计算句子“一场伟大的游戏”的欧几里德距离？

Answer 1

考虑一个关于多维空间中某个点的句子，只有在定义了坐标系后才能计算出欧几里德距离。例如。你可以介绍

O1 - 句子长度（长度）
O2 - 单词编号（WordsCount）
O2 - 按字母顺序排列的中心（我只是想到了它）。它可以计算为句子中每个作品的字母中心的算术平均值。

CharsIndex = Sum(Char.indexInWord) / CharsCountInWord; CharsCode = Sum(Char.charCode) / CharsCount; AlphWordCoordinate = [CharsIndex, CharsCode]; WordsIndex = Sum(Words.CharsIndex) / WordsCount; WordsCode = Sum(Words.CharsCode) / WordsCount; AlphaSentenceCoordinate = (WordsIndex ^2+WordsCode^2+WordIndexInSentence^2)^1/2;

因此，欧几里德距离的计算结果如下：

EuclidianSentenceDistance = (WordsCount^2 + Length^2 + AlphaSentenceCoordinate^2)^1/2

并非每个句子都可以转换为指向三维空间，例如 P [Length，Words，AlphaCoordinate] 。有一段距离你可以比较和分类句子。

我猜这不是理想的方法，但我想向你展示一个想法。

import math

def calc_word_alpha_center(word):
    chars_index = 0;
    chars_codes = 0;
    for index, char in enumerate(word):
        chars_index += index
        chars_codes += ord(char)
    chars_count = len(word)
    index = chars_index / len(word)
    code = chars_codes / len(word)
    return (index, code)


def calc_alpha_distance(words):
    word_chars_index = 0;
    word_code = 0;
    word_index = 0;
    for index, word in enumerate(words):
        point = calc_word_alpha_center(word)
        word_chars_index += point[0]
        word_code += point[1]
        word_index += index
    chars_index = word_chars_index / len(words)
    code = word_code / len(words)
    index = word_index / len(words)
    return math.sqrt(math.pow(chars_index, 2) + math.pow(code, 2) + math.pow(index, 2))

def calc_sentence_euclidean_distance(sentence):
    length = len(sentence)

    words = sentence.split(" ")
    words_count = len(words)

    alpha_distance = calc_alpha_distance(words)

    return math.sqrt(math.pow(length, 2) + math.pow(words_count, 2) + math.pow(alpha_distance, 2))


sentence1 = "a great game"
sentence2 = "A great game"

distance1 = calc_sentence_euclidean_distance(sentence1)
distance2 = calc_sentence_euclidean_distance(sentence2)

print(sentence1)
print(str(distance1))

print(sentence2)
print(str(distance2))

控制台输出

a great game
101.764433866
A great game
91.8477000256

计算KNN的欧氏距离

1 个答案: