有效地遍历一串字符串以获得成对的WMD距离矩阵

时间:2019-07-09 12:13:57

标签: python matrix nlp word2vec wmd

我正在尝试从列表字符串(报纸文章)生成成对距离的矩阵。

WMD距离未在scipy.spatial.distance.pdist中实现,因此我将此实现:https://github.com/src-d/wmd-relax链接到SpaCy。但是,我不知道如何遍历列表以生成距离矩阵。

1 个答案:

答案 0 :(得分:0)

根据文档:


import spacy
import wmd
import numpy as np


nlp = spacy.load('en_core_web_md')
nlp.add_pipe(wmd.WMD.SpacySimilarityHook(nlp), last=True)

# given articles is a list of strings
docs = [nlp(article) for article in articles]

# matrix is just a list of lists in terms of Python objects
m = []
for doc1 in docs:
    row = []
    for doc2 in docs:
        # if distance is similarity function
        row.append(doc1.similarity(doc2))
    m.append(row)

result = np.matrix(m)

Numpy matrix doc