给定语料库的共现矩阵

时间:2019-08-24 12:14:51

标签: python-3.x

我已经计算出给定语料和关键词的共现矩阵,但这是不正确的。

代码是:

Courpus=["abc def ijk pqr","pqr klm opq", "lmn pqr xyz abc def pqr abc"]
    top_words=["abc", "pqr", "def"]
    m = np.zeros([3,3]) 
    cooccurrence_matrix = pd.DataFrame(m, index = top_words, columns = top_words)
    for sent in Courpus:
        word = sent.split(" ")
        for i,d in enumerate(word):
            for j in range(max(i - 2, 0), min(i + 2,len(word))):
                try:
                    if (word[i] != word[j]):
                        cooccurrence_matrix.loc[word[i], word[j]] += 1
                except:
                    pass
    print(cooccurrence_matrix)

输出为:             abc pqr def     abc 0.0 2.0 3.0     pqr 2.0 0.0 2.0     def 2.0 1.0 0.0

Expected Output:  
   abc pqr  def
abc: 0    3 3
pqr: 3    0 2
def: 3    2 0

0 个答案:

没有答案