从Wikipedia转储读取gensim创建的JSON文件

时间:2019-08-18 09:53:41

标签: json gensim wikipedia

使用gensim读取从Wikipedia Dump创建的JSON转储的问题

我正在尝试按照此链接上的说明阅读Wiki转储并创建JSON文件。

https://radimrehurek.com/gensim/scripts/segment_wiki.html

但是代码失败。

我正在运行的代码在下面给出

from gensim import utils
import json

 # iterate over the plain text data we just created
with utils.open('D:\\enwiki-latest.json.gz', 'rb') as f:
    for line in f:
      # decode each JSON line into a Python dictionary object
      article = json.loads(line)

      # each article has a "title", a mapping of interlinks and a list of "section_titles" and
       # "section_texts".
    print("Article title: %s" % article['title'])
    print("Interlinks: %s" + article['interlinks'])
    for section_title, section_text in zip(article['section_titles'], article['section_texts']):
        print("Section text: %s" % section_text)

and the stack trace is as follows

AttributeError                            Traceback (most recent call last)
<ipython-input-1-8b1125fd41d0> in <module>()
      3 
      4  # iterate over the plain text data we just created
----> 5 with utils.open('D:\\enwiki-latest.json.gz', 'rb') as f:
      6     for line in f:
      7       # decode each JSON line into a Python dictionary object

AttributeError: module 'gensim.utils' has no attribute 'open'

请帮助我了解我的代码有什么问题。

我正在使用Anaconda在Windows 10计算机上的代码中运行

0 个答案:

没有答案