数据框中的“取消列表”列表

时间:2017-10-24 19:07:42

标签: r list dataframe

我有一个包含3列的数据框,其中一列由列表组成。 我需要将我的数据框变量与列表中的变量进行匹配,因此要排序列表。

为了更好地解释这一点,这里是我的数据的一个例子:

DF:

 i.d.    registered_at     steps
 x        2013-12-20        list of dates and integers
 y        2013-10-01        list of dates and integers
 z        2014-01-15        list of dates and integers

my_list for x:

   Day           steps
2012-03-16        556
2012-04-22         3
2013-12-24        1119

列表的长度不同。 我希望我的数据看起来像这样:

final_df:

 i.d.    registered_at         Day           steps
 x        2013-12-20        2012-03-16        556
 x        2013-12-20        2012-04-22         3
 x        2013-12-20        2013-12-24        1119
 y        2013-10-01        2013-09-08         19
 y        2013-10-01        2013-11-14        208
 z        2014-01-15        2014-01-19         5

我尝试了以下内容:

df2 <- data.frame(matrix(unlist(df$steps), nrow = 957, byrow = T))


install.packages("plyr")
library(plyr)
df3 <- ldply (df$steps, data.frame)


unlist(df$steps, recursive = TRUE, use.names = TRUE)

以下显示了我的数据第一行的str()结果:

> str(ID1)
'data.frame':   1 obs. of  3 variables:
 $ id           : int 5
 $ registered_at: chr "2011-05-20”
 $ steps        :List of 1
  ..$ :'data.frame':    957 obs. of  2 variables:
  .. ..$ day  : chr  "2011-02-16” "2011-02-23” "2012-02-12” "2012-02-        
24” ...
  .. ..$ steps: int  1057 208 709 1221 8656 16279 11988 1628 1431 17379     
...

此外,仅显示一个ID的dput()结果的快照。我使用了我的数据帧的第一行,例如“x”,我不得不缩短“...”,因为这里有太多的值要发布。

> dput(ID1)
structure(list(id = 5L, registered_at = "2011-05-20”, steps = list(
    structure(list(day = c("2011-02-16” "2011-02-23” "2012-02-12” 
"2012-02-24” ...), 
        steps = c(11057L 208L 709L 1221L 8656L 16279L 11988L 1628L 
1431L 17379L ...
        )), .Names = c("day", "steps"), class = "data.frame", row.names 
= c(NA, 
    957L)))), .Names = c("id", "registered_at", "steps"), row.names = 
1L, class = "data.frame")

> dput(head(df,5))
structure(c("function (x, df1, df2, ncp, log = FALSE) ", "{", 
"    if (missing(ncp)) ", "        .Call(C_df, x, df1, df2, log)", 
"    else .Call(C_dnf, x, df1, df2, ncp, log)"), .Dim = c(5L, 
1L), .Dimnames = list(c("1", "2", "3", "4", "5"), ""), class = 
"noquote")

有人有小费吗?谢谢!

3 个答案:

答案 0 :(得分:1)

请试试这个:

根据dput(ID1)的输出,我创建了以下data.frame:

df1 = structure(list(id = 5L, registered_at = "2011-05-20", steps = list(
structure(list(day = c("2011-02-16", "2011-02-23", "2012-02-12","2012-02-24"), 
               steps = c(11057L,208L,709L,1221L)), .Names = c("day", "steps"), class = "data.frame", row.names 
          = c(NA,957L)))), .Names = c("id", "registered_at", "steps"), row.names = 
    1L, class = "data.frame")

df1看起来像这样:

>df1
#id registered_at                                                                 steps
#1  5    2011-05-20 2011-02-16, 2011-02-23, 2012-02-12, 2012-02-24, 11057, 208, 709, 1221

之后使用plyr包的ddply功能,您可以轻松地创建所需的data.frame:

library(plyr)

ddply(.data = df1,.variables = 'id',function(t){
    n=length(t$steps[[1]]$day)
    steps=unlist(t$steps,recursive = TRUE)
    newdf=data.frame(id=t$id,registered_at=t$registered_at,day=steps[1:n],
    steps=steps[(n+1):length(steps)])
})

This returns:

#  id registered_at        day steps
#1  5    2011-05-20 2011-02-16 11057
#2  5    2011-05-20 2011-02-23   208
#3  5    2011-05-20 2012-02-12   709
#4  5    2011-05-20 2012-02-24  1221

答案 1 :(得分:0)

这个怎么样?

测试数据

df_nest <- list(
  Date = c("2012-03-16","2012-04-22","2013-12-24"),
  number = c(556,3,1119)
)

df <- tribble(
  ~id, ~important_date, ~dta,
  "x", 2013-12-20, df_nest,
  "y", 2013-12-18, df_nest,
  "z", 2013-12-16, df_nest
)

然后我们遍历每一行并展开列表并将它们绑定到一个新的data_frame结果

result = NULL
for(row in 1:nrow(df)){
  result = rbind(result,c(id = df$id[row],important_date = df$important_date[row],df$dta[row] %>% unlist(recursive = FALSE)) %>% as_data_frame())
}

答案 2 :(得分:0)

作为Mikko Marttila commented,简单的答案是:

df2 <- tidyr::unnest(df, steps)