unigram组合与python中bigrams的给定数据的词共现

时间:2019-06-06 15:22:59

标签: python-3.x csv find-occurrences

我有以下以csv格式给出的数据:

bi-gram        term_frequency  

health care    12           
rfid chip       5
care health     8

现在,我想生成一个字母组合词的共现矩阵:

       health   care   rfid   chip
health   0       20     0      0
care     20      0      0      0
rfid     0       0      0      8
chip     0       0      8      0

这是我代码的当前状态,但我不知道如何继续:

import csv
csv_file = "Mappe1.csv"
vocabulary = []
unigram1 = []
unigram2 = []
frequency_of_bigrams = {}
with open(csv_file, "r") as tdm:
    csvreader = csv.reader(tdm, delimiter=';', quotechar='|')

    next(tdm)
    with open("Term_constellation.txt", "w") as text_file:
        for row in csvreader:           
            frequency_of_bigrams[row[0]]=int(row[2])

for key in frequency_of_bigrams:
    unigram1.append(key.split(' ')[0])
    unigram2.append(key.split(' ')[1])

vocabulary = list(dict.fromkeys(unigram1+unigram2))
vocabulary.sort()

我对方程“ AB” =“ BA”有特殊的疑问。谁能向我推荐一个可以在这里找到简单方法的模块,或者可以帮助我在正确的轨道上思考问题?

0 个答案:

没有答案