将json转换为R中的数据帧?

时间:2016-12-14 20:53:22

标签: json r

我需要将json文件转换为数据框。 json文件中的每一行可能具有不同数量的条目。例如

{"timestamp":"2016-12-13T04:04:06.394-0500",
"test101":"2016-12-13T04:04:06.382-0500",
"error":"false","from":"xon","event":"DAT","BT":"work","cd":"E","id":"IBM",
"key":"20161213040330617511","begin_work":"2016-12-13T04:04:06.383-0500"","@version:"1","@timestamp":"2016-12-14T20:04:29.502Z"}

{"timestamp":"2016-12-13T04:04:05.318-0500","test101":"2016-12-13T04:03:46.074-0500","error":"false","from":"de","event":"cp","BT":"work","cd":"dsh","id":"appl",
"key":"142314089",
"begin_work":"2016-12-13T04:03:46.074-0500",
"refresh":"2016-12-13T04:03:45.920-0500",
"co_refresh":"2016-12-13T04:03:45.769-0500",
"test104":"2016-12-13T04:03:45.832-0500",
"test104":"2016-12-13T04:03:45.832-0500",
"test105":"2016-12-13T04:03:46.031-0500",
"test7":"2016-12-13T04:03:46.032-0500",
"t-test9":"2016-12-13T04:03:45.704-0500",
"test10_StartDateTimeStamp":"2016-12-13T04:03:45.704-0500",
"stop":"2016-12-13T04:03:50.772-0500",
"stop_again":"2016-12-13T04:03:46.091-0500",
"@version":"1","@timestamp":"2016-12-14T20:04:29.503Z"}
{"timestamp":"2016-12-13T04:04:07.113-0500","test101":"2016-12-13T04:04:07.068-0500","error":"false","from":"xon","event":"DAT","BT":"work","cd":"E","id":"3YPS","key":"20161213040318326935","begin_work":"2016-12-13T04:04:07.069-0500","@version":"1","@timestamp":"2016-12-14T20:04:29.505Z"}

我需要开始从一个名为" key"的关键字解析文件。直到一个名为@version的关键字。

数据框需要看起来像这样:

key group time
20161213040330617511 begin_work  2016-12-13T04:04:06.383-0500
142314089 begin_work 2016-12-13T04:03:46.074-0500
142314089 refresh 2016-12-13T04:03:45.920-0500
142314089 co_refresh 2016-12-13T04:03:45.769-0500
142314089 test104 2016-12-13T04:03:45.832-0500

我尝试过这样的事情:

library(jsonlite)
library(data.table) 

setwd("C:/file/")

filenames <- list.files("system", pattern="*json*", full.names=TRUE)

dflist <- lapply(filenames, function(i) {
  jsonlite::fromJSON(
    paste0("[",
           paste0(readLines(i),collapse=","),
           "]"),flatten=TRUE
  )
})

d<-rbindlist(dflist, use.names=TRUE, fill=TRUE)

我需要将键值对放入3列数据框

我在键之后获取字段名称,将NA作为值。任何想法如何在R?中将json转换为df帧?

1 个答案:

答案 0 :(得分:1)

这是你可以尝试的,dplyr和tidyr的组合:

library(dplyr)
library(tidyr)
library(jsonlite)

data <- jsonlite::fromJSON("data.json")
lapply(data, function(d) as_data_frame(d)) %>% 
  bind_rows() %>% 
  gather(groups, val, -timestamp, -key) %>% 
  select(key, group, timestamp)

BTW我不得不稍微改变你的json示例。 这是我使用的json文件:

{"x":{"timestamp":"2016-12-13T04:04:06.394-0500",
"test101":"2016-12-13T04:04:06.382-0500",
"error":"false","from":"xon","event":"DAT","BT":"work","cd":"E","id":"IBM",
"key":"20161213040330617511","begin_work":"2016-12-13T04:04:06.383-0500","@version":"1","@timestamp":"2016-12-14T20:04:29.502Z"},
"y":{"timestamp":"2016-12-13T04:04:05.318-0500","test101":"2016-12-13T04:03:46.074-0500","error":"false","from":"de","event":"cp","BT":"work","cd":"dsh","id":"appl",
"key":"142314089",
"begin_work":"2016-12-13T04:03:46.074-0500",
"refresh":"2016-12-13T04:03:45.920-0500",
"co_refresh":"2016-12-13T04:03:45.769-0500",
"test104":"2016-12-13T04:03:45.832-0500",
"test105":"2016-12-13T04:03:46.031-0500",
"test7":"2016-12-13T04:03:46.032-0500",
"t-test9":"2016-12-13T04:03:45.704-0500",
"test10_StartDateTimeStamp":"2016-12-13T04:03:45.704-0500",
"stop":"2016-12-13T04:03:50.772-0500",
"stop_again":"2016-12-13T04:03:46.091-0500",
"@version":"1","@timestamp":"2016-12-14T20:04:29.503Z"},
"z":{"timestamp":"2016-12-13T04:04:07.113-0500","test101":"2016-12-13T04:04:07.068-0500","error":"false","from":"xon","event":"DAT","BT":"work","cd":"E","id":"3YPS","key":"20161213040318326935","begin_work":"2016-12-13T04:04:07.069-0500","@version":"1","@timestamp":"2016-12-14T20:04:29.505Z"}}