读取磁盘上的大型JSON文件

时间:2013-05-11 07:46:27

标签: python json

我有一个大型JSON文件db.json(> 100 Mb),内容如下:

{"sitters": [["9919.html", 3, 8, 19, 47, 120, 129, 359]], "yellow": [["9945.html", 791], 
["9983.html", 1496], ["9984.html", 151]], "four": [["9971.html", 81, 403], ["9991.html", 37], 
["9995.html", 45, 225, 337], ["9975.html", 15], ["9978.html", 100], ["9948.html", 381], 
["9966.html", 228], ...

其中键是单词,值是文件名,后跟单词出现在文件中的索引。我想从这个JSON文件中查询 n 的单词数,然后检索它们相应的文件名和位置。知道如何在文件大小的情况下有效地做到这一点吗?我一直在看IJSON,但我似乎无法让它发挥作用。我试过了:

parser = parse("db.json")                                                             
for prefix, event, value in parser:                                                  
    if event == 'sitters':                                                           
        print value   

但我可能不明白如何正确使用它,因为它给我以下错误:

Traceback (most recent call last):
  File "retriever.py", line 43, in <module>
    sys.exit(main())
  File "retriever.py", line 38, in main
    for prefix, event, value in parser:
  File "/usr/local/lib/python2.7/dist-packages/ijson/common.py", line 63, in parse
    for event, value in basic_events:
  File "/usr/local/lib/python2.7/dist-packages/ijson/backends/yajl2.py", line 90, in basic_parse
    buffer = f.read(buf_size)
AttributeError: 'str' object has no attribute 'read'

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:4)

您尝试解析此行中的string 'db.json'而不是文件'db.json'

parser = parse("db.json")                                                             

正如您在错误消息中看到的那样,行buffer = f.read(buf_size)会抛出此异常:

  

属性错误:&#39; str&#39;对象没有属性&#39;读&#39;

函数parse需要一个文件:

f = open('db.json', 'r')
parser = parse(f)

并在完成工作后关闭它:

f.close()

您还可以使用with语句处理打开和关闭过程:

with open('db.json') as f:
    parser = parse(f)
    # use your parser and after leaving this block indent you're done