无法通过gensim创建字典

时间:2019-06-10 10:15:35

标签: dictionary gensim topic-modeling

我正在使用gensim来构建包含令牌列表的文档的字典。但每次显示“无类型对象都不可交互”时,我都无法执行此操作

我已经将文档转换为令牌列表,但仍然无法使用。

processed_docs=data.Text.map(preprocess)
2
processed_docs[:10]
3
​
0        [buy]
1    [product]
2    [confect]
3       [look]
4      [great]
5       [wild]
6    [saltwat]
7      [taffi]
8      [right]
9    [healthi]
Name: Text, dtype: object

from gensim.corpora import Dictionary
dct=Dictionary(processed_docs)
#dct.add_documents(processed_list)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-99-adca375e0537> in <module>()
      1 #bagofwords
      2 from gensim.corpora import Dictionary
----> 3 dct=Dictionary(processed_docs)
      4 #dct.add_documents(processed_list)
      5 #dct

/home/venv/local/lib/python2.7/site-packages/gensim/corpora/dictionary.pyc in __init__(self, documents, prune_at)
     82 
     83         if documents is not None:
---> 84             self.add_documents(documents, prune_at=prune_at)
     85 
     86     def __getitem__(self, tokenid):

/home/venv/local/lib/python2.7/site-packages/gensim/corpora/dictionary.pyc in add_documents(self, documents, prune_at)
    203 
    204             # update Dictionary with the document
--> 205             self.doc2bow(document, allow_update=True)  # ignore the result, here we only care about updating token ids
    206 
    207         logger.info(

/home/venv/local/lib/python2.7/site-packages/gensim/corpora/dictionary.pyc in doc2bow(self, document, allow_update, return_missing)
    247         # Construct (word, frequency) mapping.
    248         counter = defaultdict(int)
--> 249         for w in document:
    250             counter[w if isinstance(w, unicode) else unicode(w, 'utf-8')] += 1
    251 

TypeError: 'NoneType' object is not iterable

i want to initialise the dictionary with the tokens in processed_docs. so that it can represent each words with it's frequency.

0 个答案:

没有答案