我正在尝试使用熊猫将结果格式化为正确的表格格式,但无法执行。现在,我已经使用了基本的print()函数,它给了我难以阅读的输出。
我使用了熊猫,但仍然无法获得所需格式的输出。我不知道以表格格式获取源文档以及目标文档和相似性得分。
def calculate_similarity(self, source_doc, target_docs=[], threshold=0):
"""Calculates & returns similarity scores between given source document & all
the target documents."""
if isinstance(target_docs, str):
target_docs = [target_docs]
source_vec = self.vectorize(source_doc)
results = []
for doc in target_docs:
target_vec = self.vectorize(doc)
sim_score = self._cosine_sim(source_vec, target_vec)
if sim_score > threshold:
results.append({
'score' : sim_score,
'doc' : doc
})
# Sort results by score in desc order
results.sort(key=lambda k : k['score'] , reverse=True)
return results
ds = DocSim(w2v_model)
#Calculate the similarity score between a source rule & a target rule.
source_rule = [ '2.1.1 Context','2.2.3 Value']
target_rule = [ '2.1.1 Context','2.1.2.4 Assist Failed Train']
if isinstance(source_rule, str):
source_rule = [source_rule]
# This will return one target rules text with a similarity score
for rule in source_rule:
sim_scores = ds.calculate_similarity(rule, target_rule)
print("Similarity with {} is {}".format(rule, sim_scores))
实际结果是:
Similarity with 2.1.1 Context is [{'score': 1.0000001, 'doc': '2.1.1 Context'}, {'score': 0.0024745876, 'doc': '2.1.2.4 Assist Failed Train'}]
Similarity with 2.2.3 Value is [{'score': 0.24251467, 'doc': '2.1.1 Context'}, {'score': 0.056424055, 'doc': '2.1.2.4 Assist Failed Train'}]
我想要表格式的结果:
Source doc | Target doc | score
0 2.1.1 Context | 2.1.1 Context | 1.000000
1 | 2.1.2.4 Assist Failed Train | 0.002475
0 2.2.3 Value | 2.1.1 Context | 0.24251467
1 | 2.1.2.4 Assist Failed Train | 0.056424055