我的代码运行但我的函数输出始终为0.0
。我的代码调用.txt
个文件并创建一个矩阵,其中每个.txt
文件表示矩阵中的一行,.txt
文件中的每个单词在矩阵的相应行中都有自己的列。
我成对比较线条。我想计算两条线的联合的每个单词出现的频率。但是,虽然代码运行但我得到了错误的结果(0.0
)。
我以为我可能在函数矩阵中出错,但矩阵看起来不错。
奇怪的是,如果我手动创建列表:
a = ["a", "b", "c", "d"],
b = ["b", "c", "d", "e"]
它有效,但当我改为:
a = ["word 1", "word 2", "word 3", "word 4"],
b = ["word 2","word 3","word 4","word 5",]
结果又是0.0
。我很困惑!
我的代码:
def bow_distance(a, b):
p = 0
if len(a) > len(b):
max_words = len(a)
else:
max_words = len(b)
list_words_ab = list(set(a) | set(b))
len_bow_matrix = len(list_words_ab)
bow_matrix = numpy.zeros(shape = (3, len_bow_matrix), dtype = str)
while p < len_bow_matrix:
bow_matrix[0, p] = str(list_words_ab[p])
p = p+1
p = 0
while p < len_bow_matrix:
bow_matrix[1, p] = a.count(bow_matrix[0, p])
bow_matrix[2, p] = b.count(bow_matrix[0, p])
p = p+1
p = 0
overlap = 0
while p < len_bow_matrix:
abs_difference = abs(float(bow_matrix[1, p]) - float(bow_matrix[2, p]))
overlap = overlap + abs_difference
p = p+1
return (overlap/2)/max_num_parts
# Calculate the distances
i = 1
j = 1
while i < num_of_txt + 1:
print(i)
newfile = open("TXT_distance_" + str(i)+".txt", "w")
while j < num_of_txt + 1:
newfile.write(str(bow_distance(text_word_matrix[i-1], text_word_matrix[j-1])) + " ")
j = j+1
newfile.close()
j = 1
i = i+1
答案 0 :(得分:0)
第一眼看到我在这里看到两次失败:
a = ["a", "b", "c", "d"], <----- comma here
b = ["b", "c", "d", "e"]
it works, but when I change to:
a = ["word 1", "word 2", "word 3", "word 4"], <----- and here
b = ["word 2","word 3","word 4","word 5",] <----- and here inside the list