Question

我是stackoverflow和python的新手，所以请耐心等待。我正在尝试使用PyCharm编辑器在python中使用gensim包在文本语料库上运行Latent Dirichlet Analysis。我在R中准备了语料库，并使用此R命令将其导出到csv文件：

write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8")

它创建了以下csv结构（虽然有更长的已经预处理的文本）：

,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"

然后我尝试以下基本的python代码（基于gensim tutorials）来执行简单的LDA分析：

import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim

class MyCorpus(object):
     def __iter__(self):
             for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
                 # assume there's one document per line, tokens separated by whitespace
                 yield dictionary.doc2bow(row.split())

if __name__ == '__main__':
    dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
        '//.../test.csv', index_col=False, encoding='utf-8')['text'])
    print(dictionary)
    dictionary.save(
        '//.../greekdict.dict')  # store the dictionary, for future reference

    ## create an mmCorpus
    corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    dictionary = corpora.Dictionary.load('//.../greekdict.dict')
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    # train model
    lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)

我收到以下错误代码并退出代码：

... \ Python \ venv \ lib \ site-packages \ setuptools-28.8.0-py3.6.egg \ pkg_resources_vendor \ pyparsing.py:832：DeprecationWarning：无效的转义序列\ d

\ ... \ Python \ venv \ lib \ site-packages \ setuptools-28.8.0-py3.6.egg \ pkg_resources_vendor \ pyparsing.py:2766：DeprecationWarning：无效的转义序列\ d

\ ... \ Python \ venv \ lib \ site-packages \ setuptools-28.8.0-py3.6.egg \ pkg_resources_vendor \ pyparsing.py:2914：DeprecationWarning：无效的转义序列\ g

\ ... \ Python的\ VENV \ LIB \站点包\ pyLDAvis_prepare.py：387：   DeprecationWarning：   .ix已弃用。请用   .loc用于基于标签的索引或   .iloc用于位置索引

我找不到任何解决方案，说实话，也没有任何线索确切问题来自哪里。我花了好几个小时确保csv的编码是utf-8并导出（从R）并正确导入（在python中）。

我做错了什么或者我还能看到什么？干杯！

Answer 1

DeprecationWarining就是这样 - 警告某个功能已弃用，它应该提示用户使用其他功能来保持以后的兼容性。因此，在您的情况下，我只会关注您使用的库的更新。

从上次警告开始，它似乎来自pandas，并且已针对pyLDAvis here进行了记录。

其余的来自pyparsing模块但似乎并未明确导入它。也许您使用的某个库具有依赖关系，并使用一些相对较旧且已弃用的功能。要根除启动警告，我会检查升级是否有帮助。祝你好运！

Python LDA gensim“DeprecationWarning：无效的转义序列”

1 个答案: