我已经看到很多计算KNN的欧氏距离的例子,但是没有用于情感分类。
例如,我有句“非常接近的游戏”
如何计算句子“一场伟大的游戏”的欧几里德距离?
答案 0 :(得分:1)
考虑一个关于多维空间中某个点的句子,只有在定义了坐标系后才能计算出欧几里德距离。例如。你可以介绍
O2 - 按字母顺序排列的中心(我只是想到了它)。它可以计算为句子中每个作品的字母中心的算术平均值。
CharsIndex = Sum(Char.indexInWord) / CharsCountInWord;
CharsCode = Sum(Char.charCode) / CharsCount;
AlphWordCoordinate = [CharsIndex, CharsCode];
WordsIndex = Sum(Words.CharsIndex) / WordsCount;
WordsCode = Sum(Words.CharsCode) / WordsCount;
AlphaSentenceCoordinate = (WordsIndex ^2+WordsCode^2+WordIndexInSentence^2)^1/2;
因此,欧几里德距离的计算结果如下:
EuclidianSentenceDistance = (WordsCount^2 + Length^2 + AlphaSentenceCoordinate^2)^1/2
并非每个句子都可以转换为指向三维空间,例如 P [Length,Words,AlphaCoordinate] 。有一段距离你可以比较和分类句子。
我猜这不是理想的方法,但我想向你展示一个想法。
import math
def calc_word_alpha_center(word):
chars_index = 0;
chars_codes = 0;
for index, char in enumerate(word):
chars_index += index
chars_codes += ord(char)
chars_count = len(word)
index = chars_index / len(word)
code = chars_codes / len(word)
return (index, code)
def calc_alpha_distance(words):
word_chars_index = 0;
word_code = 0;
word_index = 0;
for index, word in enumerate(words):
point = calc_word_alpha_center(word)
word_chars_index += point[0]
word_code += point[1]
word_index += index
chars_index = word_chars_index / len(words)
code = word_code / len(words)
index = word_index / len(words)
return math.sqrt(math.pow(chars_index, 2) + math.pow(code, 2) + math.pow(index, 2))
def calc_sentence_euclidean_distance(sentence):
length = len(sentence)
words = sentence.split(" ")
words_count = len(words)
alpha_distance = calc_alpha_distance(words)
return math.sqrt(math.pow(length, 2) + math.pow(words_count, 2) + math.pow(alpha_distance, 2))
sentence1 = "a great game"
sentence2 = "A great game"
distance1 = calc_sentence_euclidean_distance(sentence1)
distance2 = calc_sentence_euclidean_distance(sentence2)
print(sentence1)
print(str(distance1))
print(sentence2)
print(str(distance2))
控制台输出
a great game
101.764433866
A great game
91.8477000256