我有以下以csv格式给出的数据:
bi-gram term_frequency
health care 12
rfid chip 5
care health 8
现在,我想生成一个字母组合词的共现矩阵:
health care rfid chip
health 0 20 0 0
care 20 0 0 0
rfid 0 0 0 8
chip 0 0 8 0
这是我代码的当前状态,但我不知道如何继续:
import csv
csv_file = "Mappe1.csv"
vocabulary = []
unigram1 = []
unigram2 = []
frequency_of_bigrams = {}
with open(csv_file, "r") as tdm:
csvreader = csv.reader(tdm, delimiter=';', quotechar='|')
next(tdm)
with open("Term_constellation.txt", "w") as text_file:
for row in csvreader:
frequency_of_bigrams[row[0]]=int(row[2])
for key in frequency_of_bigrams:
unigram1.append(key.split(' ')[0])
unigram2.append(key.split(' ')[1])
vocabulary = list(dict.fromkeys(unigram1+unigram2))
vocabulary.sort()
我对方程“ AB” =“ BA”有特殊的疑问。谁能向我推荐一个可以在这里找到简单方法的模块,或者可以帮助我在正确的轨道上思考问题?