我是Python的新手。我正在尝试编写一个执行以下操作的函数,并在代码的未来部分重用该函数: (函数的作用):
然后,我想基于上述函数返回的列表进行计算。但是,该函数(即knearest_similarity(tfidf_datamatrix))不返回任何内容。第二个函数(即threshold_function())中的打印命令不显示任何内容。有人可以看看代码并告诉我我做错了什么。
def knearest_similarity(tfidf_datamatrix):
k_nearest_cosineMean = []
for datavector in tfidf_datamatrix:
cosineValueSet = []
for trainingvector in tfidf_vectorizer_trainingset:
cosineValue = cx(datavector, trainingvector)
cosineValueSet.append(cosineValue)
similarityMean_of_k_nearest_neighbours = np.mean(heapq.nlargest(k_nearest_neighbours, cosineValueSet)) #the cosine similarity score of top k nearest neighbours
k_nearest_cosineMean.append(similarityMean_of_k_nearest_neighbours)
print k_nearest_cosineMean
return k_nearest_cosineMean
def threshold_function():
mean_cosineScore_mean = np.mean(knearest_similarity(tfidf_matrix_testset))
std_cosineScore_mean = np.std(knearest_similarity(tfidf_matrix_testset))
threshold = mean_cosineScore_mean - (3*std_cosineScore_mean)
print "The Mean of the mean of cosine similarity score for a normal Behaviour:", mean_cosineScore_mean #The mean will be used for finding the threshold
print "The standard deviation of the mean of cosine similarity score:", std_cosineScore_mean #The standstart deviation is also used to find threshold
print "The threshold for normal behaviour should be (Mean - 3*standard deviation):", threshold
return threshold
修改
我尝试为要使用的函数定义两个全局变量(即tfidf_vectorizer_trainingset和tfidf_matrix_testset)。
#fitting tfidf transfrom for training data
tfidf_vectorizer_trainingset = tfidf_vectorizer.fit_transform(readfile(trainingdataDir)).toarray()
#tfidf transform the test set based on the training set
tfidf_matrix_testset = tfidf_vectorizer.transform(readfile(testingdataDir)).toarray().
但 threshold_function()中的打印命令如下所示:
The Mean of the mean of cosine similarity score for a normal Behaviour: nan
The standard deviation of the mean of cosine similarity score: nan
The threshold for normal behaviour should be (Mean - 3*standard deviation): nan
EDIT2 我发现 k_nearest_cosineMean 中的第一个值是 nan 。删除值后,我设法获得有效的计算。
答案 0 :(得分:3)
我呼叫threshold_function()
knearest_similarity(tfidf_matrix_testset)
的第一行,但您从未定义tfidf_matrix_testset
是什么。你也在第二行做到了。在第三行中,您使用第二行的输出。给tfidf_matrix_testset
一个值。