我经常以嵌套列表的形式接收数据。我最终编写了各种代码来将这些代码展平为data.frames
。我想要一个更通用的解决方案,所以我不是为每个单独的列表编写一个代码。所以这里有一些示例数据来突出我的问题。
data_list <- list(structure(list(local_date_time = "2010-01-05T13:30:00",
value = -9999, data_quality = list(structure(list(qualifierid = 19,
qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T14:00:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T14:30:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T15:00:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T15:30:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T16:00:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T16:30:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T17:00:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T17:30:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality"
)), structure(list(local_date_time = "2010-01-05T18:00:00", value = -9999,
data_quality = list(structure(list(qualifierid = 19, qualifier_description = "Passed sanity check; see incident report IR_8",
valid = FALSE), .Names = c("qualifierid", "qualifier_description",
"valid")))), .Names = c("local_date_time", "value", "data_quality")))
最简单的方法当然是rbind
列表。 data.table
的{{1}}在较大的列表中速度很快,如此:
rbindlist
但这会返回:
library(data.table)
rbindlist(data_list)
这是不理想的,因为最后一列实际上是3个项目的嵌套列表。我可以使用 local_date_time value data_quality
1: 2010-01-05T13:30:00 -9999 <list>
2: 2010-01-05T14:00:00 -9999 <list>
3: 2010-01-05T14:30:00 -9999 <list>
4: 2010-01-05T15:00:00 -9999 <list>
5: 2010-01-05T15:30:00 -9999 <list>
6: 2010-01-05T16:00:00 -9999 <list>
7: 2010-01-05T16:30:00 -9999 <list>
8: 2010-01-05T17:00:00 -9999 <list>
9: 2010-01-05T17:30:00 -9999 <list>
10: 2010-01-05T18:00:00 -9999 <list>
plyr
这很好用。有没有办法将此方法推广到可能具有不同格式的嵌套列表的列表?如果列表是单个级别,则应该使用简单的library(plyr)
result <- ldply(data_list, function(x) {
cbind(data.frame(t(unlist(x[1:2]))), data.frame(t(unlist(x[3]))))
})
。在这种情况下,我知道第3个元素有一个子列表。但我经常不知道。为每个人编写自定义包装器会有点单调乏味。
答案 0 :(得分:2)
我遇到了一个名为LinearizeNestedList
的函数Akhil S Bhel(有时候是在SO上)。它“扁平化”嵌套列表。
在您的情况下,您希望“展平”子列表,而不是主列表本身。
也许它可以在你的情况下使用如下:
library(devtools)
source_gist("https://gist.github.com/mrdwab/4205477")
# Sourcing https://gist.github.com/mrdwab/4205477/raw/1bd86c697b89de9941834882f1085c8312076e38/LinearizeNestedList.R
# SHA-1 hash of file is dde479195258dbad9367274ceedbd5a68251478a
x <- do.call(rbind.data.frame, lapply(data_list, LinearizeNestedList))
x
# local_date_time value data_quality.1.qualifierid
# 2 2010-01-05T13:30:00 -9999 19
# 21 2010-01-05T14:00:00 -9999 19
# 3 2010-01-05T14:30:00 -9999 19
# 4 2010-01-05T15:00:00 -9999 19
# 5 2010-01-05T15:30:00 -9999 19
# 6 2010-01-05T16:00:00 -9999 19
# 7 2010-01-05T16:30:00 -9999 19
# 8 2010-01-05T17:00:00 -9999 19
# 9 2010-01-05T17:30:00 -9999 19
# 10 2010-01-05T18:00:00 -9999 19
# data_quality.1.qualifier_description data_quality.1.valid
# 2 Passed sanity check; see incident report IR_8 FALSE
# 21 Passed sanity check; see incident report IR_8 FALSE
# 3 Passed sanity check; see incident report IR_8 FALSE
# 4 Passed sanity check; see incident report IR_8 FALSE
# 5 Passed sanity check; see incident report IR_8 FALSE
# 6 Passed sanity check; see incident report IR_8 FALSE
# 7 Passed sanity check; see incident report IR_8 FALSE
# 8 Passed sanity check; see incident report IR_8 FALSE
# 9 Passed sanity check; see incident report IR_8 FALSE
# 10 Passed sanity check; see incident report IR_8 FALSE
答案 1 :(得分:0)
一个简单的lapply
as.data.frame
也会这样做,至少只要你只有一个嵌套级别:
> res <- do.call(rbind, lapply(data_list, as.data.frame))
> str(res)
'data.frame': 10 obs. of 5 variables:
$ local_date_time : Factor w/ 10 levels "2010-01-05T13:30:00",..: 1 2 3 4 5 6 7 8 9 10
$ value : num -9999 -9999 -9999 -9999 -9999 ...
$ data_quality.qualifierid : num 19 19 19 19 19 19 19 19 19 19
$ data_quality.qualifier_description: Factor w/ 1 level "Passed sanity check; see incident report IR_8": 1 1 1 1 1 1 1 1 1 1
$ data_quality.valid : logi FALSE FALSE FALSE FALSE FALSE FALSE ...