解压缩列表{R}的R数据框列

时间:2016-05-09 14:21:12

标签: r data.table

在R中我有一个data.frame(或data.table)。在这个data.frame中我有一个列,每个单元格都包含一个列表列表(data.frame)。

我可以通过rbindlist(data$Subdocuments)将此列转换为单个data.frame,但这里缺少原始data.frame的其他列。

如何有效地解压缩此列列表,但保持其他列附加到新的data.frame?

     library(data.table)

    data <- structure(list(ID = c("1", "2", "3"), Country = c("Netherlands", 
"Germany", "Belgium"), Subdocuments = list(structure(list(Value = c("5", 
"5", "1", "3", "2", "1", "1", "1", "2", "5", "3", "2", "4", "5", 
"5", "2"), Label = c("Test1", "Test2", "Test3", "Test4", "Test5", 
"Test6", "Test7", "Test8", "Test9", "Test10", "Test11", "Test12", 
"Test13", "Test14", "Test15", "Test16"), Year = c(2001, 2002, 
2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 
2014, 2015, 2016)), .Names = c("Value", "Label", "Year"), class = "data.frame", row.names = c(NA, 
16L)), structure(list(Value = c("5", "4", "3", "2", "2", "2", 
"1", "1", "5", "4", "4", "4", "5", "1", "1", "3"), Label = c("Test1", 
"Test2", "Test3", "Test4", "Test5", "Test6", "Test7", "Test8", 
"Test9", "Test10", "Test11", "Test12", "Test13", "Test14", "Test15", 
"Test16"), Year = c(2001, 2002, 2003, 2004, 2005, 2006, 2007, 
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016)), .Names = c("Value", 
"Label", "Year"), class = "data.frame", row.names = c(NA, 16L
)), structure(list(Value = c("1", "2", "3", "1", "1", "4", "5", 
"1", "2", "3", "2", "2", "1", "1", "1", "5"), Label = c("Test1", 
"Test2", "Test3", "Test4", "Test5", "Test6", "Test7", "Test8", 
"Test9", "Test10", "Test11", "Test12", "Test13", "Test14", "Test15", 
"Test16"), Year = c(2001, 2002, 2003, 2004, 2005, 2006, 2007, 
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016)), .Names = c("Value", 
"Label", "Year"), class = "data.table", row.names = c(NA, 16L
)))), .Names = c("ID", "Country", "Subdocuments"), row.names = c(NA, 
-3L), class = "data.frame")

2 个答案:

答案 0 :(得分:4)

我愿意

setDT(data)

dfcol   = "Subdocuments"
othcols = setdiff(names(data), dfcol)

subs = rbindlist(data[[dfcol]], id=TRUE)
subs[, (othcols) := data[.id, othcols, with=FALSE]]

如果您不想setDT(data),则可以更改最后一行data[.id, othcols]

答案 1 :(得分:0)

可能有帮助

library(data.table)
rbindlist(setNames(data[[3]], do.call(paste, data[1:2])), idcol=TRUE)[
        , c("ID", "Country") := tstrsplit(.id, " ")][, .id := NULL][]
# Value  Label Year ID     Country
# 1:     5  Test1 2001  1 Netherlands
# 2:     5  Test2 2002  1 Netherlands
# 3:     1  Test3 2003  1 Netherlands
# 4:     3  Test4 2004  1 Netherlands
# 5:     2  Test5 2005  1 Netherlands
# 6:     1  Test6 2006  1 Netherlands
# 7:     1  Test7 2007  1 Netherlands
# 8:     1  Test8 2008  1 Netherlands
# 9:     2  Test9 2009  1 Netherlands
#10:     5 Test10 2010  1 Netherlands
#11:     3 Test11 2011  1 Netherlands
#12:     2 Test12 2012  1 Netherlands
#13:     4 Test13 2013  1 Netherlands
#14:     5 Test14 2014  1 Netherlands
#15:     5 Test15 2015  1 Netherlands
#16:     2 Test16 2016  1 Netherlands
#17:     5  Test1 2001  2     Germany
#18:     4  Test2 2002  2     Germany
#19:     3  Test3 2003  2     Germany
#20:     2  Test4 2004  2     Germany
#21:     2  Test5 2005  2     Germany
#22:     2  Test6 2006  2     Germany
#23:     1  Test7 2007  2     Germany
#24:     1  Test8 2008  2     Germany
#25:     5  Test9 2009  2     Germany
#26:     4 Test10 2010  2     Germany
#27:     4 Test11 2011  2     Germany
#28:     4 Test12 2012  2     Germany
#29:     5 Test13 2013  2     Germany
#30:     1 Test14 2014  2     Germany
#31:     1 Test15 2015  2     Germany
#32:     3 Test16 2016  2     Germany
#33:     1  Test1 2001  3     Belgium
#34:     2  Test2 2002  3     Belgium
#35:     3  Test3 2003  3     Belgium
#36:     1  Test4 2004  3     Belgium
#37:     1  Test5 2005  3     Belgium
#38:     4  Test6 2006  3     Belgium
#39:     5  Test7 2007  3     Belgium
#40:     1  Test8 2008  3     Belgium
#41:     2  Test9 2009  3     Belgium
#42:     3 Test10 2010  3     Belgium
#43:     2 Test11 2011  3     Belgium
#44:     2 Test12 2012  3     Belgium
#45:     1 Test13 2013  3     Belgium
#46:     1 Test14 2014  3     Belgium
#47:     1 Test15 2015  3     Belgium
#48:     5 Test16 2016  3     Belgium

注意:'数据'来自OP自己的帖子。

或使用dplyr

library(dplyr)
bind_rows(data[[3]], .id="ID") %>% 
            left_join(data[-3], ., by = "ID")