Python LDA gensim“DeprecationWarning:无效的转义序列”

时间:2018-03-20 16:02:48

标签: r python-3.x export-to-csv gensim deprecation-warning

我是stackoverflow和python的新手,所以请耐心等待。 我正在尝试使用PyCharm编辑器在python中使用gensim包在文本语料库上运行Latent Dirichlet Analysis。我在R中准备了语​​料库,并使用此R命令将其导出到csv文件:

write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8") 

它创建了以下csv结构(虽然有更长的已经预处理的文本):

,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"

然后我尝试以下基本的python代码(基于gensim tutorials)来执行简单的LDA分析:

import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim

class MyCorpus(object):
     def __iter__(self):
             for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
                 # assume there's one document per line, tokens separated by whitespace
                 yield dictionary.doc2bow(row.split())

if __name__ == '__main__':
    dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
        '//.../test.csv', index_col=False, encoding='utf-8')['text'])
    print(dictionary)
    dictionary.save(
        '//.../greekdict.dict')  # store the dictionary, for future reference

    ## create an mmCorpus
    corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    dictionary = corpora.Dictionary.load('//.../greekdict.dict')
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    # train model
    lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)

我收到以下错误代码并退出代码:

  

... \ Python \ venv \ lib \ site-packages \ setuptools-28.8.0-py3.6.egg \ pkg_resources_vendor \ pyparsing.py:832:DeprecationWarning:无效的转义序列\ d

     

\ ... \ Python \ venv \ lib \ site-packages \ setuptools-28.8.0-py3.6.egg \ pkg_resources_vendor \ pyparsing.py:2766:DeprecationWarning:无效的转义序列\ d

     

\ ... \ Python \ venv \ lib \ site-packages \ setuptools-28.8.0-py3.6.egg \ pkg_resources_vendor \ pyparsing.py:2914:DeprecationWarning:无效的转义序列\ g

     

\ ... \ Python的\ VENV \ LIB \站点包\ pyLDAvis_prepare.py:387:   DeprecationWarning:   .ix已弃用。请用   .loc用于基于标签的索引或   .iloc用于位置索引

我找不到任何解决方案,说实话,也没有任何线索确切问题来自哪里。我花了好几个小时确保csv的编码是utf-8并导出(从R)并正确导入(在python中)。

我做错了什么或者我还能看到什么?干杯!

1 个答案:

答案 0 :(得分:0)

DeprecationWarining就是这样 - 警告某个功能已弃用,它应该提示用户使用其他功能来保持以后的兼容性。因此,在您的情况下,我只会关注您使用的库的更新。

从上次警告开始,它似乎来自pandas,并且已针对pyLDAvis here进行了记录。

其余的来自pyparsing模块但似乎并未明确导入它。也许您使用的某个库具有依赖关系,并使用一些相对较旧且已弃用的功能。要根除启动警告,我会检查升级是否有帮助。祝你好运!