在Python中加载大型JCON文件 - 错误= JSONDecodeError:额外数据

时间:2017-10-08 02:21:02

标签: python json yelp

  

我正在尝试在python中加载来自yelp的文件business.json   学术数据可用于他们的学术挑战,见下文   (https://www.yelp.com/dataset/documentation/json)我的目标是提取所有餐馆及其身份证,然后找到我所在的一家餐馆   我很感兴趣。一旦我有这个餐厅ID,我想加载   review.json并提取该餐厅的所有评论。可悲的是,我   我坚持登陆.json

的初始阶段

这就是business.json的样子:

{
    // string, 22 character unique string business id
    "business_id": "tnhfDv5Il8EaGSXZGiuQGg",

    // string, the business's name
    "name": "Garaje",

    // string, the neighborhood's name
    "neighborhood": "SoMa",

    // string, the full address of the business
    "address": "475 3rd St",

    // string, the city
    "city": "San Francisco",

    // string, 2 character state code, if applicable
    "state": "CA",

    // string, the postal code
    "postal code": "94107",

    // float, latitude
    "latitude": 37.7817529521,

    // float, longitude
    "longitude": -122.39612197,

    // float, star rating, rounded to half-stars
    "stars": 4.5,

    // interger, number of reviews
    "review_count": 1198,

    // integer, 0 or 1 for closed or open, respectively
    "is_open": 1,

    // object, business attributes to values. note: some attribute values might be objects
    "attributes": {
        "RestaurantsTakeOut": true,
        "BusinessParking": {
            "garage": false,
            "street": true,
            "validated": false,
            "lot": false,
            "valet": false
        },
    },

    // an array of strings of business categories
    "categories": [
        "Mexican",
        "Burgers",
        "Gastropubs"
    ],

    // an object of key day to value hours, hours are using a 24hr clock
    "hours": {
        "Monday": "10:00-21:00",
        "Tuesday": "10:00-21:00",
        "Friday": "10:00-21:00",
        "Wednesday": "10:00-21:00",
        "Thursday": "10:00-21:00",
        "Sunday": "11:00-18:00",
        "Saturday": "10:00-21:00"
    }
}

当我尝试使用以下代码导入business.json时:

import json

jsonBus = json.loads(open('business.json').read())
for item in jsonBus:
    name = item.get("Name")
    businessID = item.get("business_id")

我收到以下错误:

runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
Traceback (most recent call last):

  File "<ipython-input-46-68ba9d6458bc>", line 1, in <module>
    runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
    execfile(filename, namespace)

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/Nico/Google Drive/Python/yelp/yelp_academic.py", line 3, in <module>
    jsonBus = json.loads(open('business.json').read())

  File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)

  File "/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)

JSONDecodeError: Extra data

有谁知道为什么会出现这样的错误?

我也愿意采取任何更聪明的方式继续进行!

最佳,

尼科

2 个答案:

答案 0 :(得分:1)

如果您的json文件与您提到的完全相同,则它不应该有注释(a.k.a。// string, 22 character unique string business id),因为它们不是标准的一部分。

请在此处查看相关信息:Can comments be used in JSON?

答案 1 :(得分:0)

我认为这有效 - 我使用相同的数据集并且有类似的错误。看到似乎有用的评论here

import json

js = [json.loads(line) for line in open('business.json')]
for item in js:
    name = item.get("name")
    businessID = item.get("business_id")

但是,我仍然想知道为什么json.loads()无效。文件本身看起来很好。