Question

下面是gensim的例子，但每当我执行它时，它显示出不同的结果，所以我无法相信gensim效果很好。

from gensim import corpora, models, similarities
from collections import defaultdict

documents = ["Human machine interface for lab abc computer applications",          # 0
             "A survey of user opinion of computer system response time",          # 1
             "The EPS user interface management system",                           # 2
             "System and human system engineering testing of EPS",                 # 3
             "Relation of user perceived response time to error measurement",      # 4
             "The generation of random binary unordered trees",                    # 5
             "The intersection graph of paths in trees",                           # 6
             "Graph minors IV Widths of trees and well quasi ordering",            # 7 
             "Graph minors A survey"]                                              # 8


stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

frequency = defaultdict(int)
for text in texts:
    for token in text:
        frequency[token] += 1
texts = [[token for token in text if frequency[token] > 1]
         for text in texts]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
lda = models.LdaModel(corpus, id2word=dictionary, num_topics=2)
index = similarities.MatrixSimilarity(lda[corpus])


doc = "Human computer interaction"
vec_bow = dictionary.doc2bow(doc.lower().split())
vec_lda = lda[vec_bow]
sims = index[vec_lda]
sims = sorted(enumerate(sims), key=lambda item: -item[1])
print(sims)

print(lda.get_document_topics(vec_bow))

结果

[（ 0 ，0.9986434），（4,0.99792993），（2,0.99722278），（3,0.99651831），（1,0.9958639），（5,0.53059661），（6， 0.4146674），（8,0.38019019），（7,0.36143348）] [（0,0.18366596），（1,0.81633401）]

[（ 1 ，0.999605），（4,0.9981864），（0,0.998689），（5,0.62957084），（6,0.48837978），（8,0.45152202），（3， 0.4541581），（7,0.41751832），（2,0.40637407）] [（0,0.80285221），（1,0.19714773）]

[（ 7 ，0.99957085），（8,0.99660784），（0,0.99202132），（5,0.78449017），（6,0.77530348），（2,0.56972337），（3， 0.47117239），（4,0.47092015），（1,0.4172135）] [（0,0.25292286），（1,0.7707717）]

结果7看起来与“人机交互”看起来并不相似。感谢。

执行gensim示例的不同结果

0 个答案: