我使用gensim库训练了一个LDA模型,我用它来提取文档的主题向量,我使用下面的代码
def clean_doc(data_string):
global en_stop
tokenizer = RegexpTokenizer(r'\w+') #Create appropriate tokenizer
p_stemmer = PorterStemmer() #Create object from Porter Stemmer
#clean and tokenize document string
raw = data_string.lower()
tokens = tokenizer.tokenize(raw)
# remove stop words from tokens
stopped_tokens = [i for i in tokens if not i in en_stop]
# stem tokens
stemmed_tokens = [p_stemmer.stem(i) for i in stopped_tokens]
return stemmed_tokens
def infer_lda_vector(s, dictionary, model, dimensions):
#s = s.decode('utf-8')
vector = [0.0]*dimensions
s = clean_doc(s)
bow_vector = dictionary.doc2bow(s)
lda_vector = model[bow_vector]
for i in lda_vector:
vector[i[0]] = i[1]
return vector
我称之为:
text = "this a test"
lda_vector = infer_lda_vector(text, dictionary, lda_model, 300)
当我使用Python2.7时,这段确切的代码正在运行,但当我将系统更新为Python3.x时,它会抛出以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-723f03d03620> in <module>()
1 text = "this a a test"
----> 2 lda_vector = infer_lda_vector(text, dictionary, lda_model, 300)
3 lda_vector
<ipython-input-34-885205b68d9e> in infer_lda_vector(s, dictionary, model, dimensions)
34 s = clean_doc(s)
35 bow_vector = dictionary.doc2bow(s)
---> 36 lda_vector = model[bow_vector]
37 for i in lda_vector:
38 vector[i[0]] = i[1]
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\ldamodel.py in __getitem__(self, bow, eps)
1158 `(topic_id, topic_probability)` 2-tuples.
1159 """
-> 1160 return self.get_document_topics(bow, eps, self.minimum_phi_value, self.per_word_topics)
1161
1162 def save(self, fname, ignore=('state', 'dispatcher'), separately=None, *args, **kwargs):
C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\ldamodel.py in get_document_topics(self, bow, minimum_probability, minimum_phi_value, per_word_topics)
979 if minimum_probability is None:
980 minimum_probability = self.minimum_probability
--> 981 minimum_probability = max(minimum_probability, 1e-8) # never allow zero values in sparse output
982
983 if minimum_phi_value is None:
TypeError: '>' not supported between instances of 'float' and 'NoneType'
我做错了什么?
答案 0 :(得分:0)
用conda清洁并重新安装它。
conda clean -t
conda install gensim
我猜测安装了损坏的版本,并且在重新安装之前clean命令将其删除。