我正在为用于放射学报告的主题建模实施LDA。以下是我正在使用的代码
import pandas as pd
data = pd.read_csv('chest_ct.csv', error_bad_lines=False);
import gensim
def preprocess(text):
result=[]
for token in gensim.utils.simple_preprocess(text) :
if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3:
result.append((token))
return result
processed_docs = []
for doc in data:
processed_docs.append(preprocess(doc))
dictionary = gensim.corpora.Dictionary(processed_docs)
lda_model = gensim.models.LdaMulticore(data,
num_topics = 8,
id2word = dictionary,
passes = 2,
workers = 2)
该代码正在处理字典,并在lda_model处失败。我还尝试通过此链接https://pythonhosted.org/lda/使用scikit-learn实现LDA,并再次收到相同的错误。谁能帮忙吗?