如何在新文本上测试训练有素的NMF主题模型

时间:2017-05-18 22:13:39

标签: python scikit-learn nlp topic-modeling nmf

我在python中创建了一个NMF主题模型,代码片段如下:

def select_vectorizer(req_ngram_range=[1,2]):
    ngram_lengths = req_ngram_range
    vectorizer = TfidfVectorizer(analyzer='word', ngram_range=(ngram_lengths), stop_words='english', min_df=2)
    #print("User specified custom stopwords: {} ...".format(str(custom_stopwords)[1:-1]))
    return vectorizer

vectorizer = select_vectorizer([2,5])
X = vectorizer.fit_transform(new_review_list)


clf = decomposition.NMF(n_components=20, random_state=3, alpha = .1).fit(X)
vocab = vectorizer.get_feature_names()
print_top_words(clf, vocab, num_top_words)

创建了20个主题,如下所示:

Topic #0:
[u'blocks available', u'delivery blocks available', u'notifications blocks', u'notifications blocks available', u'new blocks', u'know blocks available', u'new blocks available', u'know blocks', u'open blocks available', u'available work', u'zero blocks', u'like blocks', u'notification blocks', u'day blocks', u'slow blocks', u'10 blocks', u'option set', u'logged 10', u'notification blocks available', u'day blocks available']
Topic #1:
[u'amazon flex', u'working amazon', u'amazon flex app', u'working amazon flex', u'hello amazon', u'hello amazon flex', u'flex delivery', u'amazon flex delivery', u'flex team', u'amazon flex team', u'work amazon', u'amazon flex support', u'flex support', u'work amazon flex', u'deliver amazon', u'hi amazon flex', u'hi amazon', u'deliver amazon flex', u'signed amazon', u'love amazon'] and so on..

现在我想对新文本进行测试,以便根据这些类别对这些文本进行分类。我该怎么做?

0 个答案:

没有答案