Question

我构建了以下代码来分析Jaccard的相似性：

import pandas as pd
import csv

df = pd.read_csv('data.csv', usecols=[0]
                    ,names=['Question'], 
                       encoding='utf-8')

out = []
for i in df['Question']:
       str1 = i
       for q in df['Question']:
             str2 = q
             a = set(str1.split()) 
             b = set(str2.split())
             c = a.intersection(b)
             out.append({'Question': q,
                'Result': (float(len(c)) / (len(a) + len(b) - len(c)))})


new_df = pd.DataFrame(out, columns=['Question','Result'])
new_df.to_csv('output.csv', index=False, encoding='utf-8')

输出文件如下：

Question          Result
The sky is blue    1.0
The ocean is blue  0.6
The sky is blue    0.6
The ocean is blue  1.0

它确实会返回结果，现在，我想更改CSV输出以显示如下结果：

Question          The sky is blue The ocean is blue
The sky is blue    1.0             0.6
The ocean is blue  0.6             1.0

我试图更改代码并使用writerows，但我想我还是有所遗漏，谢谢。

Answer 1

将defaultdict与DataFrame构造函数一起使用：

from collections import defaultdict

out1 = defaultdict(dict)
for i in df['Question']:
       str1 = i
       for q in df['Question']:
             str2 = q
             a = set(str1.split()) 
             b = set(str2.split())
             c = a.intersection(b)
             out1[i][q] = (float(len(c)) / (len(a) + len(b) - len(c)))
print (out1)

df = pd.DataFrame(out1)
print (df)
                   The sky is blue  The ocean is blue
The ocean is blue              0.6                1.0
The sky is blue                1.0                0.6

带有DataFrame.pivot的原始解决方案：

out = []
for i in df['Question']:
       str1 = i
       for q in df['Question']:
             str2 = q
             a = set(str1.split()) 
             b = set(str2.split())
             c = a.intersection(b)
             out.append({'Question1': q, 'Question2': i,
                'Result': (float(len(c)) / (len(a) + len(b) - len(c)))})

df = pd.DataFrame(out).pivot('Question1', 'Question2', 'Result')
print (df)
Question2          The ocean is blue  The sky is blue
Question1                                            
The ocean is blue                1.0              0.6
The sky is blue                  0.6              1.0

将Jaccard相似度保存在CSV文件中

1 个答案: