Yelp业务数据包含100个实例,格式如下:
{
"_id" : ObjectId("5aab338ffc08b46adb7a2320"),
"business_id" : "Pd52CjgyEU3Rb8co6QfTPw",
"name" : "Flight Deck Bar & Grill",
"neighborhood" : "Southeast",
"address" : "6730 S Las Vegas Blvd",
"city" : "Las Vegas",
"state" : "NV",
"postal_code" : "89119",
"latitude" : 36.0669136,
"longitude" : -115.1708484,
"stars" : 4.0,
"review_count" : NumberInt(13),
"is_open" : NumberInt(1),
"attributes" : {
"Alcohol" : "full_bar",
"HasTV" : true,
"NoiseLevel" : "average",
"RestaurantsAttire" : "casual",
"BusinessAcceptsCreditCards" : true,
"Music" : {
"dj" : false,
"background_music" : true,
"no_music" : false,
"karaoke" : false,
"live" : false,
"video" : false,
"jukebox" : false
},
"Ambience" : {
"romantic" : false,
"intimate" : false,
"classy" : false,
"hipster" : false,
"divey" : false,
"touristy" : false,
"trendy" : false,
"upscale" : false,
"casual" : true
},
"RestaurantsGoodForGroups" : true,
"Caters" : true,
"WiFi" : "free",
"RestaurantsReservations" : false,
"RestaurantsTableService" : true,
"RestaurantsTakeOut" : true,
"GoodForKids" : true,
"HappyHour" : true,
"GoodForDancing" : false,
"BikeParking" : true,
"OutdoorSeating" : false,
"RestaurantsPriceRange2" : NumberInt(2),
"RestaurantsDelivery" : false,
"BestNights" : {
"monday" : false,
"tuesday" : false,
"friday" : false,
"wednesday" : true,
"thursday" : false,
"sunday" : false,
"saturday" : false
},
"GoodForMeal" : {
"dessert" : false,
"latenight" : false,
"lunch" : true,
"dinner" : false,
"breakfast" : false,
"brunch" : false
},
"BusinessParking" : {
"garage" : false,
"street" : false,
"validated" : false,
"lot" : true,
"valet" : false
},
"CoatCheck" : false,
"Smoking" : "no",
"WheelchairAccessible" : true
},
"categories" : [
"Nightlife",
"Bars",
"Barbeque",
"Sports Bars",
"American (New)",
"Restaurants"
],
"hours" : {
"Monday" : "8:30-22:30",
"Tuesday" : "8:30-22:30",
"Friday" : "8:30-22:30",
"Wednesday" : "8:30-22:30",
"Thursday" : "8:30-22:30",
"Sunday" : "8:30-22:30",
"Saturday" : "8:30-22:30"
}
}
我需要在R中导入它。我有以下代码:
library('jsonlite')
data<- stream_in(file("~/Desktop/business100.json"))
当我使用上面的代码时,它会出现以下错误:
Error: lexical error: invalid char in json text.
{ "_id" : ObjectId("5aab338ffc08b46adb7a2
(right here) ------^
我认为json的格式存在一些问题,但是当我在mongodb中看到json文件时,看起来很好。可以做些什么,谢谢!
答案 0 :(得分:1)
如果这是 bin/rails db:environment:set RAILS_ENV=development
(如评论中所示),这可能是最好的方法。如果您遇到困难并且由于某种原因无法使用它,则可以替换这些非JSON属性并使用常规JSON解析器对其进行解析。
要概括,请创建(逐字)字符串的向量。我假设每个属性都是mongolite
形式,因此基于您提供的数据的良好起点是:
DiscardableProperty(save_all_here)
(注意缺少ptns <- c('ObjectId', 'NumberInt')
str(jsontxt)
# chr "{ \n \"_id\" : ObjectId(\"5aab338ffc08b46adb7a2320\"), \n \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n \"name\" : "| __truncated__
jsontxt2 <- Reduce(function(txt, p) gsub(sprintf("%s\\(([^)]+)\\)", p), "\\1", txt),
ptns, init=jsontxt)
str(jsontxt2)
# chr "{ \n \"_id\" : \"5aab338ffc08b46adb7a2320\", \n \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n \"name\" : \"Flight D"| __truncated__
。)
这解析得很好:
ObjectId
修改:单次通过替换:
str(fromJSON(jsontxt2))
# List of 16
# $ _id : chr "5aab338ffc08b46adb7a2320"
# $ business_id : chr "Pd52CjgyEU3Rb8co6QfTPw"
# $ name : chr "Flight Deck Bar & Grill"
# $ neighborhood: chr "Southeast"
# $ address : chr "6730 S Las Vegas Blvd"
# ...