gensim.scripts"没有这样的文件或目录"

时间:2015-12-03 20:52:30

标签: python cmd gensim

我正在尝试分析Wikipedia转储文件。我正在使用gensim.scripts,一个Python库,并在Windows 10 cmd.exe中运行此命令:

python -m gensim.scripts.make_wiki enwiki-latest-pages-articles.xml.bz2 wiki_en_output

这给了我错误:Microsoft Windows [Version 10.0.10586] (c)2015 Microsoft Corporation。保留所有权利。

2015-12-03 15:47:20,459 : INFO : running C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\scripts\make_wiki.py enwiki-latest-pages-articles.xml.bz2 wiki_en_output
Traceback (most recent call last):
  File "C:\Python27\lib\runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\scripts\make_wiki.py", line 84, in <module>
    wiki = WikiCorpus(inp, lemmatize=lemmatize) # takes about 9h on a macbook pro, for 3.5m articles (june 2011)
  File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\wikicorpus.py", line 270, in __init__
    self.dictionary = Dictionary(self.get_texts())
  File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\dictionary.py", line 58, in __init__
    self.add_documents(documents, prune_at=prune_at)
  File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\dictionary.py", line 119, in add_documents
    for docno, document in enumerate(documents):
  File "C:\Python27\lib\site-packages\gensim-0.12.3-py2.7-win32.egg\gensim\corpora\wikicorpus.py", line 290, in get_texts
    texts = ((text, self.lemmatize, title, pageid) for title, text, pageid in extract_pages(bz2.BZ2File(self.fname), self.filter_namespaces))
IOError: [Errno 2] No such file or directory: 'enwiki-latest-pages-articles.xml.bz2'

关于我应该怎么做来解决这个问题的想法?

在Windows 10.已安装gensim.scripts。

1 个答案:

答案 0 :(得分:1)

只需将整个路径放到下载的enwiki-latest-pages-articles.xml.bz2上,或尝试从下载文件夹中运行gensim脚本。

如果您没有该存档 - 您可以从转储维基媒体网站找到并下载