我试图读取Yelp API返回的大型JSON文件(存储为.txt文件),并将其转换为数据框。我的JSON文件位于"漂亮的打印"格式,前3个JSON对象如下:
{
"businesses": [
{
"address1": "11301 Wilshire Blvd",
"address2": "",
"address3": "",
"avg_rating": 3.0,
"categories": [
{
"category_filter": "hospitals",
"name": "Hospitals",
"search_url": "http://www.yelp.com/search?cflt=hospitals&find_desc=&find_loc=11301+Wilshire+Blvd%2C+Los+Angeles+90073"
}
],
"city": "Los Angeles",
"country": "USA",
"country_code": "US",
"distance": 0.0,
"id": "9yWDlJ5l1i6O36Fxp5JIBw",
"is_closed": false,
"mobile_url": "http://m.yelp.com/biz/west-los-angeles-medical-center-los-angeles-2",
"name": "West Los Angeles Medical Center",
"nearby_url": "http://www.yelp.com/search?find_desc=&find_loc=11301+Wilshire+Blvd%2C+Los+Angeles+90073",
"neighborhoods": [],
"phone": "3104783711",
"photo_url": "http://media2.fl.yelpcdn.com/bpthumb/IZ82DgJAy8emp4dX7UvbUw/ms",
"photo_url_small": "http://media2.fl.yelpcdn.com/bpthumb/IZ82DgJAy8emp4dX7UvbUw/ss",
"rating_img_url": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/34bc8086841c/ico/stars/v1/stars_3.png",
"rating_img_url_small": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/902abeed0983/ico/stars/v1/stars_small_3.png",
"review_count": 40,
"reviews": [
{
"date": "2014-11-25",
"id": "wO4jShjiPoWDBR_OV3cGmQ",
"mobile_uri": "/biz/west-los-angeles-medical-center-los-angeles-2?full=True&hrid=wO4jShjiPoWDBR_OV3cGmQ",
"rating": 5,
"rating_img_url": "http://s3-media1.fl.yelpcdn.com/assets/2/www/img/f1def11e4e79/ico/stars/v1/stars_5.png",
"rating_img_url_small": "http://s3-media1.fl.yelpcdn.com/assets/2/www/img/c7623205d5cd/ico/stars/v1/stars_small_5.png",
"text_excerpt": "I dropped my super expensive insurance, that is not feasible to afford now, and joined the VA. Shortly after signing up it was found that I needed surgery....",
"url": "http://www.yelp.com/biz/west-los-angeles-medical-center-los-angeles-2?hrid=wO4jShjiPoWDBR_OV3cGmQ",
"user_name": "Dj D.",
"user_photo_url": "http://media1.fl.yelpcdn.com/upthumb/s1RKYlrzhKCZs_zSS0cVOA/ms",
"user_photo_url_small": "http://media1.fl.yelpcdn.com/upthumb/s1RKYlrzhKCZs_zSS0cVOA/ss",
"user_url": "http://www.yelp.com/user_details?userid=cJyDfLw9uJT63MwFgz7XnA"
}
],
"state": "CA",
"state_code": "CA",
"url": "http://www.yelp.com/biz/west-los-angeles-medical-center-los-angeles-2",
"zip": "90073"
},
{
"address1": "11301 Wilshire",
"address2": "Bldg 306",
"address3": "",
"avg_rating": 3.0,
"categories": [
{
"category_filter": "cafeteria",
"name": "Cafeteria",
"search_url": "http://www.yelp.com/search?cflt=cafeteria&find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073"
}
],
"city": "Los Angeles",
"country": "USA",
"country_code": "US",
"distance": 0.0,
"id": "K8eEx2J3pF3b-w6EZwKY5w",
"is_closed": false,
"mobile_url": "http://m.yelp.com/biz/va-canteen-wla-los-angeles",
"name": "VA Canteen WLA",
"nearby_url": "http://www.yelp.com/search?find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073",
"neighborhoods": [],
"phone": "3104783711",
"photo_url": "http://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_styleguide/5f69f303f17c/assets/img/default_avatars/business_medium_square.png",
"photo_url_small": "http://s3-media3.fl.yelpcdn.com/assets/srv0/yelp_styleguide/6671667140ef/assets/img/default_avatars/business_small_square.png",
"rating_img_url": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/34bc8086841c/ico/stars/v1/stars_3.png",
"rating_img_url_small": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/902abeed0983/ico/stars/v1/stars_small_3.png",
"review_count": 4,
"reviews": [
{
"date": "2014-11-02",
"id": "rzoQx7o9sla7ig3QZAjtUg",
"mobile_uri": "/biz/va-canteen-wla-los-angeles?full=True&hrid=rzoQx7o9sla7ig3QZAjtUg",
"rating": 3,
"rating_img_url": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/34bc8086841c/ico/stars/v1/stars_3.png",
"rating_img_url_small": "http://s3-media3.fl.yelpcdn.com/assets/2/www/img/902abeed0983/ico/stars/v1/stars_small_3.png",
"text_excerpt": "This place serves its function. There are a few stations where you can grab food if you don't want to venture off the VA premises for lunch. However, the...",
"url": "http://www.yelp.com/biz/va-canteen-wla-los-angeles?hrid=rzoQx7o9sla7ig3QZAjtUg",
"user_name": "James W.",
"user_photo_url": "http://media1.fl.yelpcdn.com/upthumb/6UlTMXf0VkFmmmXwXe8Flg/ms",
"user_photo_url_small": "http://media1.fl.yelpcdn.com/upthumb/6UlTMXf0VkFmmmXwXe8Flg/ss",
"user_url": "http://www.yelp.com/user_details?userid=qgXcgfdrk5tzmLBq4_h6mQ"
}
],
"state": "CA",
"state_code": "CA",
"url": "http://www.yelp.com/biz/va-canteen-wla-los-angeles",
"zip": "90073"
},
{
"address1": "11301 Wilshire",
"address2": "Bldg 306",
"address3": "",
"avg_rating": 2.0,
"categories": [
{
"category_filter": "cafeteria",
"name": "Cafeteria",
"search_url": "http://www.yelp.com/search?cflt=cafeteria&find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073"
}
],
"city": "Los Angeles",
"country": "USA",
"country_code": "US",
"distance": 0.0,
"id": "4etl04G_-VwP8NJ2F3nu4w",
"is_closed": false,
"mobile_url": "http://m.yelp.com/biz/va-canteen-wla-2-los-angeles",
"name": "VA Canteen WLA 2",
"nearby_url": "http://www.yelp.com/search?find_desc=&find_loc=11301+Wilshire%2C+Los+Angeles+90073",
"neighborhoods": [],
"phone": "3104783711",
"photo_url": "http://s3-media2.fl.yelpcdn.com/assets/srv0/yelp_styleguide/5f69f303f17c/assets/img/default_avatars/business_medium_square.png",
"photo_url_small": "http://s3-media3.fl.yelpcdn.com/assets/srv0/yelp_styleguide/6671667140ef/assets/img/default_avatars/business_small_square.png",
"rating_img_url": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/b561c24f8341/ico/stars/v1/stars_2.png",
"rating_img_url_small": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/a6210baec261/ico/stars/v1/stars_small_2.png",
"review_count": 1,
"reviews": [
{
"date": "2014-02-01",
"id": "G9Qr5OpQHs0qo89LFzYIGA",
"mobile_uri": "/biz/va-canteen-wla-2-los-angeles?full=True&hrid=G9Qr5OpQHs0qo89LFzYIGA",
"rating": 2,
"rating_img_url": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/b561c24f8341/ico/stars/v1/stars_2.png",
"rating_img_url_small": "http://s3-media2.fl.yelpcdn.com/assets/2/www/img/a6210baec261/ico/stars/v1/stars_small_2.png",
"text_excerpt": "Mon-fri 7am to 1:30 pm\n\nBreakfast and Lunch\n\nThe grill team is good, Grill masters!\n\nCoffee is ok",
"url": "http://www.yelp.com/biz/va-canteen-wla-2-los-angeles?hrid=G9Qr5OpQHs0qo89LFzYIGA",
"user_name": "Patrick D.",
"user_photo_url": "http://media1.fl.yelpcdn.com/upthumb/essl4VxDB599GHCamIdDdA/ms",
"user_photo_url_small": "http://media1.fl.yelpcdn.com/upthumb/essl4VxDB599GHCamIdDdA/ss",
"user_url": "http://www.yelp.com/user_details?userid=B7VkaAqckBslmw5HtstA1A"
}
],
"state": "CA",
"state_code": "CA",
"url": "http://www.yelp.com/biz/va-canteen-wla-2-los-angeles",
"zip": "90073"
}
],
"message": {
"code": 0,
"text": "OK",
"version": "1.1.1"
}
}
{
"businesses": [],
"message": {
"code": 0,
"text": "OK",
"version": "1.1.1"
}
}
{
"businesses": [],
"message": {
"code": 0,
"text": "OK",
"version": "1.1.1"
}
}
我尝试过以下R代码:
library(dplyr)
library(plyr)
library(jsonlite)
df <- fromJSON(paste(readLines("Yelp facility pretty print v2.txt"), collapse=""))
但这只返回第一个JSON对象。
然后我尝试了:
df <- fromJSON(sprintf("[%s]", paste(readLines("Yelp facility pretty print v2.txt"), collapse=",")))
但这会返回错误&#34; ...意外字符&#34;,&#34 ;;期望为键值打开字符串引号(&#34;)。&#34;
我验证了我的JSON文件中没有空白行。非常感谢任何建议/帮助!
答案 0 :(得分:0)
您的问题是JSON格式错误,如以下答案所示:https://stackoverflow.com/a/34714966/5258043
您的输入JSON格式错误,并且在根级别具有多个元素。这类似于定义具有多个根的XML文档,这当然是不允许的。
阅读文件的正确方法是:
my_data <- rjson::fromJSON(file='./yelp.txt')
但是,由于多个根元素,它失败了。您可以删除第一个元素后的所有内容,也可以将所有内容包装在一个大根中,方法是添加到文本文件的顶部和底部,并用逗号分隔每个元素,以便JSON中的每个条目都是它自己的列表元素。
注意:你可以使用jsonlite
包,我使用rjson
,因为它的默认解析创建了一个更好的列表,但你真的可以使用任何一个包。这是你的偏好。