R - 导入并合并许多(嵌套?)JSON

时间:2016-02-01 03:02:43

标签: json r nested

我希望合并150个小JSON文件(所有格式都用相同的变量格式化),我已经通过jsonlite导入到R中。

问题是每个文件导入为1的列表。我可以让个人转换为数据帧,但找不到系统转换所有文件的方法。

目标是将所有合并到一个数据集中。

来自JSON文件的示例:

{
    "data": [
        {
            "EventId": "20020528X00745",
            "narrative": "NTSB investigators may not have traveled in support of this investigation and used data provided by various sources to prepare this aircraft accident report.During the dark night cross-country flight, while at a cruise altitude of 2.000 feet msl, the pilot initiated a climb to 3,000 feet.  A few minutes later, the engine's rpm dropped 200-300 rpm.  The 67-hour pilot increased throttle to check for an rpm response.  Subsequently, the engine lost power, and a forced landing was initiated.  While approaching to land, the pilot noticed trees in front of the airplanes flight path and started looking for another place to land, but couldn't see anything because it was too dark.  Subsequently, the aircraft impacted tress coming to rest upright.  An examination of the engine under the supervision of an FAA inspector, revealed the left magneto's internal gears did not rotate with the engine.  Removal of the left magneto revealed only one of two rubber drive isolators inside the ignition harness cap.    Internal inspection revealed the contact points on the left hand side of the magneto did not open on rotation.  Further examination of the airplane, displayed the ignition key turned to the left magneto only.  The pilot reported to the NTSB investigator-in-charge, that he did not touch any switch while exiting the aircraft.",
            "probable_cause": "The pilot's failure to set the ignition key to the both magnetos position, which resulted in a loss of engine power.  Contributing factors were the failure of the left magneto, the lack of suitable terrain for the forced landing, and  the dark night."
        },
        {
            "EventId": "20090414X14441",
            "narrative": "NTSB investigators used data provided by various entities, including, but not limited to, the Federal Aviation Administration and/or the operator and did not travel in support of this investigation to prepare this aircraft accident report.The pilot was following a highway to the northwest at 10,000 feet mean sea level.  He crossed the mountain pass between 700 and 1,000 feet above ground level climbing slowly.  Once on the west side of the pass, approaching the base of some cliffs, they encountered a strong down draft and the airspeed dropped rapidly and the airplane started to descend. The pilot reports that he attempted to keep the airspeed at 85 knots and climb but, that the airplane continued to lose altitude.  He checked the engine instruments and did not note any degradation of engine performance.  The airplane continued to descend.  The pilot executed a forced landing in approximately the center of the valley ahead of them. The pilot reported that there were no preimpact mechanical malfunctions or failures.  Based on the temperature and pressure readings from the closest weather reporting station, the density altitude at the accident site was about 9,200 feet.",
            "probable_cause": "The pilot's encounter with a windshear/downdraft that exceeded the climb performance capabilities of the airplane."
        },
  1. 使用fromJSON(file_000.json)导入 - 创建“大型列表”
  2. 导入后,df <- file_000.json$data会生成包含3个变量的数据框
  3. 但是,我不知道从大型列表输入创建150个新dfs的方法。我试过apply,do.call,函数,循环。

    两个不仅仅适用于单个数据帧,但不要让我达到我需要的150个:

    test2 <- as.data.frame(file_000.json$data)
    test3 <- unnest(file_000.json)
    

1 个答案:

答案 0 :(得分:0)

library(dplyr)
library(jsonlite)
x <- '{
    "data": [
        {
            "EventId": "20020528X00745",
            "narrative": "NTSB investigators",
            "probable_cause": "The pilots failure"
        },
        {
            "EventId": "asdfasfasfasfasdasdf",
            "narrative": "NTSB investigators",
            "probable_cause": "The pilots failure"
        },
        {
            "EventId": "asdfafsdf",
            "narrative": "NTSB investigators",
            "probable_cause": "The pilots failure"
        }
    ]
}
'
files <- replicate(10, tempfile(fileext = ".json"))
for (i in seq_along(files)) cat(x, file = files[i])
dplyr::bind_rows(lapply(files, function(z) {
  jsonlite::fromJSON(z)$data
}))

#>    Source: local data frame [30 x 3]
#>
#>                EventId          narrative     probable_cause
#>                  (chr)              (chr)              (chr)
#>    1        20020528X00745 NTSB investigators The pilots failure
#>    2  asdfasfasfasfasdasdf NTSB investigators The pilots failure
#>    3             asdfafsdf NTSB investigators The pilots failure
#>    4        20020528X00745 NTSB investigators The pilots failure
#>    5  asdfasfasfasfasdasdf NTSB investigators The pilots failure
#>    6             asdfafsdf NTSB investigators The pilots failure
#>    7        20020528X00745 NTSB investigators The pilots failure
#>    8  asdfasfasfasfasdasdf NTSB investigators The pilots failure
#>    9             asdfafsdf NTSB investigators The pilots failure
#>    10       20020528X00745 NTSB investigators The pilots failure
#>    ..                  ...                ...                ...