我有一个包含3列的数据框,其中一列由列表组成。 我需要将我的数据框变量与列表中的变量进行匹配,因此要排序列表。
为了更好地解释这一点,这里是我的数据的一个例子:
DF:
i.d. registered_at steps
x 2013-12-20 list of dates and integers
y 2013-10-01 list of dates and integers
z 2014-01-15 list of dates and integers
my_list for x:
Day steps
2012-03-16 556
2012-04-22 3
2013-12-24 1119
列表的长度不同。 我希望我的数据看起来像这样:
final_df:
i.d. registered_at Day steps
x 2013-12-20 2012-03-16 556
x 2013-12-20 2012-04-22 3
x 2013-12-20 2013-12-24 1119
y 2013-10-01 2013-09-08 19
y 2013-10-01 2013-11-14 208
z 2014-01-15 2014-01-19 5
我尝试了以下内容:
df2 <- data.frame(matrix(unlist(df$steps), nrow = 957, byrow = T))
install.packages("plyr")
library(plyr)
df3 <- ldply (df$steps, data.frame)
unlist(df$steps, recursive = TRUE, use.names = TRUE)
以下显示了我的数据第一行的str()
结果:
> str(ID1)
'data.frame': 1 obs. of 3 variables:
$ id : int 5
$ registered_at: chr "2011-05-20”
$ steps :List of 1
..$ :'data.frame': 957 obs. of 2 variables:
.. ..$ day : chr "2011-02-16” "2011-02-23” "2012-02-12” "2012-02-
24” ...
.. ..$ steps: int 1057 208 709 1221 8656 16279 11988 1628 1431 17379
...
此外,仅显示一个ID的dput()
结果的快照。我使用了我的数据帧的第一行,例如“x”,我不得不缩短“...”,因为这里有太多的值要发布。
> dput(ID1)
structure(list(id = 5L, registered_at = "2011-05-20”, steps = list(
structure(list(day = c("2011-02-16” "2011-02-23” "2012-02-12”
"2012-02-24” ...),
steps = c(11057L 208L 709L 1221L 8656L 16279L 11988L 1628L
1431L 17379L ...
)), .Names = c("day", "steps"), class = "data.frame", row.names
= c(NA,
957L)))), .Names = c("id", "registered_at", "steps"), row.names =
1L, class = "data.frame")
> dput(head(df,5))
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{",
" if (missing(ncp)) ", " .Call(C_df, x, df1, df2, log)",
" else .Call(C_dnf, x, df1, df2, ncp, log)"), .Dim = c(5L,
1L), .Dimnames = list(c("1", "2", "3", "4", "5"), ""), class =
"noquote")
有人有小费吗?谢谢!
答案 0 :(得分:1)
请试试这个:
根据dput(ID1)
的输出,我创建了以下data.frame:
df1 = structure(list(id = 5L, registered_at = "2011-05-20", steps = list(
structure(list(day = c("2011-02-16", "2011-02-23", "2012-02-12","2012-02-24"),
steps = c(11057L,208L,709L,1221L)), .Names = c("day", "steps"), class = "data.frame", row.names
= c(NA,957L)))), .Names = c("id", "registered_at", "steps"), row.names =
1L, class = "data.frame")
df1看起来像这样:
>df1
#id registered_at steps
#1 5 2011-05-20 2011-02-16, 2011-02-23, 2012-02-12, 2012-02-24, 11057, 208, 709, 1221
之后使用plyr
包的ddply
功能,您可以轻松地创建所需的data.frame:
library(plyr)
ddply(.data = df1,.variables = 'id',function(t){
n=length(t$steps[[1]]$day)
steps=unlist(t$steps,recursive = TRUE)
newdf=data.frame(id=t$id,registered_at=t$registered_at,day=steps[1:n],
steps=steps[(n+1):length(steps)])
})
This returns:
# id registered_at day steps
#1 5 2011-05-20 2011-02-16 11057
#2 5 2011-05-20 2011-02-23 208
#3 5 2011-05-20 2012-02-12 709
#4 5 2011-05-20 2012-02-24 1221
答案 1 :(得分:0)
这个怎么样?
测试数据
df_nest <- list(
Date = c("2012-03-16","2012-04-22","2013-12-24"),
number = c(556,3,1119)
)
df <- tribble(
~id, ~important_date, ~dta,
"x", 2013-12-20, df_nest,
"y", 2013-12-18, df_nest,
"z", 2013-12-16, df_nest
)
然后我们遍历每一行并展开列表并将它们绑定到一个新的data_frame结果
result = NULL
for(row in 1:nrow(df)){
result = rbind(result,c(id = df$id[row],important_date = df$important_date[row],df$dta[row] %>% unlist(recursive = FALSE)) %>% as_data_frame())
}
答案 2 :(得分:0)
作为Mikko Marttila commented,简单的答案是:
df2 <- tidyr::unnest(df, steps)