解析R中的json文件时出错

时间:2018-04-07 23:41:27

标签: r json

Yelp业务数据包含100个实例,格式如下:

{ 
    "_id" : ObjectId("5aab338ffc08b46adb7a2320"), 
    "business_id" : "Pd52CjgyEU3Rb8co6QfTPw", 
    "name" : "Flight Deck Bar & Grill", 
    "neighborhood" : "Southeast", 
    "address" : "6730 S Las Vegas Blvd", 
    "city" : "Las Vegas", 
    "state" : "NV", 
    "postal_code" : "89119", 
    "latitude" : 36.0669136, 
    "longitude" : -115.1708484, 
    "stars" : 4.0, 
    "review_count" : NumberInt(13), 
    "is_open" : NumberInt(1), 
    "attributes" : {
        "Alcohol" : "full_bar", 
        "HasTV" : true, 
        "NoiseLevel" : "average", 
        "RestaurantsAttire" : "casual", 
        "BusinessAcceptsCreditCards" : true, 
        "Music" : {
            "dj" : false, 
            "background_music" : true, 
            "no_music" : false, 
            "karaoke" : false, 
            "live" : false, 
            "video" : false, 
            "jukebox" : false
        }, 
        "Ambience" : {
            "romantic" : false, 
            "intimate" : false, 
            "classy" : false, 
            "hipster" : false, 
            "divey" : false, 
            "touristy" : false, 
            "trendy" : false, 
            "upscale" : false, 
            "casual" : true
        }, 
        "RestaurantsGoodForGroups" : true, 
        "Caters" : true, 
        "WiFi" : "free", 
        "RestaurantsReservations" : false, 
        "RestaurantsTableService" : true, 
        "RestaurantsTakeOut" : true, 
        "GoodForKids" : true, 
        "HappyHour" : true, 
        "GoodForDancing" : false, 
        "BikeParking" : true, 
        "OutdoorSeating" : false, 
        "RestaurantsPriceRange2" : NumberInt(2), 
        "RestaurantsDelivery" : false, 
        "BestNights" : {
            "monday" : false, 
            "tuesday" : false, 
            "friday" : false, 
            "wednesday" : true, 
            "thursday" : false, 
            "sunday" : false, 
            "saturday" : false
        }, 
        "GoodForMeal" : {
            "dessert" : false, 
            "latenight" : false, 
            "lunch" : true, 
            "dinner" : false, 
            "breakfast" : false, 
            "brunch" : false
        }, 
        "BusinessParking" : {
            "garage" : false, 
            "street" : false, 
            "validated" : false, 
            "lot" : true, 
            "valet" : false
        }, 
        "CoatCheck" : false, 
        "Smoking" : "no", 
        "WheelchairAccessible" : true
    }, 
    "categories" : [
        "Nightlife", 
        "Bars", 
        "Barbeque", 
        "Sports Bars", 
        "American (New)", 
        "Restaurants"
    ], 
    "hours" : {
        "Monday" : "8:30-22:30", 
        "Tuesday" : "8:30-22:30", 
        "Friday" : "8:30-22:30", 
        "Wednesday" : "8:30-22:30", 
        "Thursday" : "8:30-22:30", 
        "Sunday" : "8:30-22:30", 
        "Saturday" : "8:30-22:30"
    }
}

我需要在R中导入它。我有以下代码:

library('jsonlite')
data<- stream_in(file("~/Desktop/business100.json"))

当我使用上面的代码时,它会出现以下错误:

Error: lexical error: invalid char in json text.
                         {     "_id" : ObjectId("5aab338ffc08b46adb7a2
                     (right here) ------^

我认为json的格式存在一些问题,但是当我在mongodb中看到json文件时,看起来很好。可以做些什么,谢谢!

1 个答案:

答案 0 :(得分:1)

如果这是 bin/rails db:environment:set RAILS_ENV=development (如评论中所示),这可能是最好的方法。如果您遇到困难并且由于某种原因无法使用它,则可以替换这些非JSON属性并使用常规JSON解析器对其进行解析。

要概括,请创建(逐字)字符串的向量。我假设每个属性都是mongolite形式,因此基于您提供的数据的良好起点是:

DiscardableProperty(save_all_here)

(注意缺少ptns <- c('ObjectId', 'NumberInt') str(jsontxt) # chr "{ \n \"_id\" : ObjectId(\"5aab338ffc08b46adb7a2320\"), \n \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n \"name\" : "| __truncated__ jsontxt2 <- Reduce(function(txt, p) gsub(sprintf("%s\\(([^)]+)\\)", p), "\\1", txt), ptns, init=jsontxt) str(jsontxt2) # chr "{ \n \"_id\" : \"5aab338ffc08b46adb7a2320\", \n \"business_id\" : \"Pd52CjgyEU3Rb8co6QfTPw\", \n \"name\" : \"Flight D"| __truncated__ 。)

这解析得很好:

ObjectId

修改:单次通过替换:

str(fromJSON(jsontxt2))
# List of 16
#  $ _id         : chr "5aab338ffc08b46adb7a2320"
#  $ business_id : chr "Pd52CjgyEU3Rb8co6QfTPw"
#  $ name        : chr "Flight Deck Bar & Grill"
#  $ neighborhood: chr "Southeast"
#  $ address     : chr "6730 S Las Vegas Blvd"
# ...