我正在尝试打印脚本的输出。但是为此,我必须使用许多印刷品。有没有办法在不进行所有打印的情况下拥有所有主题?
import pandas
import mglearn
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
dataset = pandas.read_csv('text.csv', encoding = 'utf-8')
comments = dataset['comments']
comments_list = remove_small_words.values.tolist()
vector = CountVectorizer()
X = vector.fit_transform(comments_list)
lda = LatentDirichletAllocation(n_components = 30, learning_method = "batch", max_iter = 25, random_state = 0)
document_topics = lda.fit_transform(X)
sorting = np.argsort(lda.components_, axis = 1)[:, ::-1]
feature_names = np.array(vector.get_feature_names())
topics = mglearn.tools.print_topics(topics = range(30), feature_names = feature_names, sorting = sorting, topics_per_chunk = 5, n_words = 10)
print(topics)
print("Topic 0:")
docs = np.argsort(document_topics[:, 0])[::-1]
for i in docs[:]:
print(" ".join(comments_list[i].encode('utf-8').split(",")[:2]) + "\n")
print()
print()
print("Topic 1:")
docs = np.argsort(document_topics[:, 1])[::-1]
for i in docs[:]:
print(" ".join(comments_list[i].encode('utf-8').split(",")[:2]) + "\n")
print()
print()
...
print("Topic 40:")
docs = np.argsort(document_topics[:, 40])[::-1]
for i in docs[:]:
print(" ".join(comments_list[i].encode('utf-8').split(",")[:2]) + "\n")
print()
print()
例如,我可以循环打印所有内容,而不是打印40次吗?要打印这40个主题,我需要240行代码。假设我需要打印100张... 我有这个输出,我想保留它:
主题0:
blabla
blabla
主题1:
blabla
blabla
主题3:
blabla
blabla
...
答案 0 :(得分:3)
您可以使用字符串格式来确定每个主题要打印的字符串:
for i in range(topics):
print("Topic {}:".format(i))
然后,由于您拥有i
,因此可以这样添加其他语句:
for i in range(topics):
print("Topic {}:".format(i))
docs = np.argsort(document_topics[:, i])[::-1]
for j in docs[:]:
print(" ".join(comments_list[j].encode('utf-8').split(",")[:2]) + "\n")