我有一个带有多层嵌套的json文件,但我正努力将其放入可行的数据框中。我创建了一个基于真实结构的模拟数据玩具示例:here是要点。
这是我想要的输出。输出可能是“更长”的,也可能是原始json中有其他变量,但我正在显示核心问题。
这是json的一部分,它显示了我想要进入半长格式的嵌套的最深层次,如上面的白色所示(完全宽的格式就可以了)。
我已经用这个对象尝试了很多事情:
myList <- jsonlite::fromJSON("example.json", flatten=TRUE)$results
从尝试对[][[]]
和cbind()
进行子集化,到尝试对嵌套列表进行嵌套的其他尝试。没什么好说的。从最佳方法的建议中我将受益匪浅。
答案 0 :(得分:2)
这会让您更进一步吗? (这是一个粗糙的结构):
library(tidyverse)
x <- (jsonlite::fromJSON("/Users/hrbrmstr/r7/gh/labs-research/2018-11-portland-ciso-event/example.json"))
jsonlite::stream_out(x$results, con = gzfile("ex-res.json.gz"))
y <- ndjson::stream_in("ex-res.json.gz", "tbl")
gather(y, path, path_val, starts_with("path")) %>%
gather(flow, flow_val, starts_with("flow")) %>%
gather(name, name_val, starts_with("values.pdep")) %>%
gather(intervention, interv_val, starts_with("values.inter")) %>%
glimpse()
## Observations: 87,696
## Variables: 18
## $ contact.name <chr> "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2", "Person 1", "Person 2"...
## $ contact.uuid <chr> "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", "rd3jfui", "k0dcjs", ...
## $ created_on <chr> "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "2016-02-08T07:00:15.093813Z", "...
## $ id <dbl> 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235, 1234, 1235...
## $ modified_on <chr> "2016-02-09T04:42:54.812323Z", "2016-02-08T08:09:51.545160Z", "2016-02-09T04:42:54.812323Z", "...
## $ responded <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE...
## $ start.uuid <chr> "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", "kfj4dsi", "dnxh4g", ...
## $ uuid <chr> "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", "qask9dj", "esn4dk", ...
## $ exit_type <chr> NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "completed", NA, "comp...
## $ exited_on <chr> NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51.544998Z", NA, "2016-02-08T08:09:51...
## $ path <chr> "path.0.node", "path.0.node", "path.0.time", "path.0.time", "path.1.node", "path.1.node", "pat...
## $ path_val <chr> "ecb4cb11-6cca-4791-a950-c448e9300846", "ecb4cb11-6cca-4791-a950-c448e9300846", "2016-02-08T07...
## $ flow <chr> "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "flow.name", "fl...
## $ flow_val <chr> "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyratings", "weeklyra...
## $ name <chr> "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.category", "values.pdeps1.c...
## $ name_val <chr> "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 - 7", "0 -...
## $ intervention <chr> "values.intervention", "values.intervention", "values.intervention", "values.intervention", "v...
## $ interv_val <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
完全方法:
gather(y, path, path_val, starts_with("path")) %>%
gather(flow, flow_val, starts_with("flow")) %>%
gather(name, name_val, starts_with("values.pdep")) %>%
gather(intervention, interv_val, starts_with("values.inter")) %>%
filter(grepl(".value", name)) %>%
filter(grepl("node", path)) %>%
mutate(variable = gsub("values.", "", name)) %>%
mutate(variable = gsub(".value", "", variable)) %>%
distinct(contact.name, uuid, name, .keep_all = TRUE) %>%
select(id, uuid, contact.uuid, variable, name_val, created_on, modified_on) %>%
arrange(id, created_on) # optional wide %>% spread(variable, name_val)