我正在尝试在python中加载来自yelp的文件business.json 学术数据可用于他们的学术挑战,见下文 (https://www.yelp.com/dataset/documentation/json)我的目标是提取所有餐馆及其身份证,然后找到我所在的一家餐馆 我很感兴趣。一旦我有这个餐厅ID,我想加载 review.json并提取该餐厅的所有评论。可悲的是,我 我坚持登陆.json
的初始阶段
这就是business.json的样子:
{
// string, 22 character unique string business id
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
// string, the business's name
"name": "Garaje",
// string, the neighborhood's name
"neighborhood": "SoMa",
// string, the full address of the business
"address": "475 3rd St",
// string, the city
"city": "San Francisco",
// string, 2 character state code, if applicable
"state": "CA",
// string, the postal code
"postal code": "94107",
// float, latitude
"latitude": 37.7817529521,
// float, longitude
"longitude": -122.39612197,
// float, star rating, rounded to half-stars
"stars": 4.5,
// interger, number of reviews
"review_count": 1198,
// integer, 0 or 1 for closed or open, respectively
"is_open": 1,
// object, business attributes to values. note: some attribute values might be objects
"attributes": {
"RestaurantsTakeOut": true,
"BusinessParking": {
"garage": false,
"street": true,
"validated": false,
"lot": false,
"valet": false
},
},
// an array of strings of business categories
"categories": [
"Mexican",
"Burgers",
"Gastropubs"
],
// an object of key day to value hours, hours are using a 24hr clock
"hours": {
"Monday": "10:00-21:00",
"Tuesday": "10:00-21:00",
"Friday": "10:00-21:00",
"Wednesday": "10:00-21:00",
"Thursday": "10:00-21:00",
"Sunday": "11:00-18:00",
"Saturday": "10:00-21:00"
}
}
当我尝试使用以下代码导入business.json时:
import json
jsonBus = json.loads(open('business.json').read())
for item in jsonBus:
name = item.get("Name")
businessID = item.get("business_id")
我收到以下错误:
runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
Traceback (most recent call last):
File "<ipython-input-46-68ba9d6458bc>", line 1, in <module>
runfile('/Users/Nico/Google Drive/Python/yelp/yelp_academic.py', wdir='/Users/Nico/Google Drive/Python/yelp')
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/Nico/Google Drive/Python/yelp/yelp_academic.py", line 3, in <module>
jsonBus = json.loads(open('business.json').read())
File "/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/anaconda3/lib/python3.6/json/decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
JSONDecodeError: Extra data
有谁知道为什么会出现这样的错误?
我也愿意采取任何更聪明的方式继续进行!
最佳,
尼科
答案 0 :(得分:1)
如果您的json文件与您提到的完全相同,则它不应该有注释(a.k.a。// string, 22 character unique string business id
),因为它们不是标准的一部分。
请在此处查看相关信息:Can comments be used in JSON?
答案 1 :(得分:0)
我认为这有效 - 我使用相同的数据集并且有类似的错误。看到似乎有用的评论here。
import json
js = [json.loads(line) for line in open('business.json')]
for item in js:
name = item.get("name")
businessID = item.get("business_id")
但是,我仍然想知道为什么json.loads()
无效。文件本身看起来很好。