在R中我有一个data.frame(或data.table)。在这个data.frame中我有一个列,每个单元格都包含一个列表列表(data.frame)。
我可以通过rbindlist(data$Subdocuments)
将此列转换为单个data.frame,但这里缺少原始data.frame的其他列。
如何有效地解压缩此列列表,但保持其他列附加到新的data.frame?
library(data.table)
data <- structure(list(ID = c("1", "2", "3"), Country = c("Netherlands",
"Germany", "Belgium"), Subdocuments = list(structure(list(Value = c("5",
"5", "1", "3", "2", "1", "1", "1", "2", "5", "3", "2", "4", "5",
"5", "2"), Label = c("Test1", "Test2", "Test3", "Test4", "Test5",
"Test6", "Test7", "Test8", "Test9", "Test10", "Test11", "Test12",
"Test13", "Test14", "Test15", "Test16"), Year = c(2001, 2002,
2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
2014, 2015, 2016)), .Names = c("Value", "Label", "Year"), class = "data.frame", row.names = c(NA,
16L)), structure(list(Value = c("5", "4", "3", "2", "2", "2",
"1", "1", "5", "4", "4", "4", "5", "1", "1", "3"), Label = c("Test1",
"Test2", "Test3", "Test4", "Test5", "Test6", "Test7", "Test8",
"Test9", "Test10", "Test11", "Test12", "Test13", "Test14", "Test15",
"Test16"), Year = c(2001, 2002, 2003, 2004, 2005, 2006, 2007,
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016)), .Names = c("Value",
"Label", "Year"), class = "data.frame", row.names = c(NA, 16L
)), structure(list(Value = c("1", "2", "3", "1", "1", "4", "5",
"1", "2", "3", "2", "2", "1", "1", "1", "5"), Label = c("Test1",
"Test2", "Test3", "Test4", "Test5", "Test6", "Test7", "Test8",
"Test9", "Test10", "Test11", "Test12", "Test13", "Test14", "Test15",
"Test16"), Year = c(2001, 2002, 2003, 2004, 2005, 2006, 2007,
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016)), .Names = c("Value",
"Label", "Year"), class = "data.table", row.names = c(NA, 16L
)))), .Names = c("ID", "Country", "Subdocuments"), row.names = c(NA,
-3L), class = "data.frame")
答案 0 :(得分:4)
我愿意
setDT(data)
dfcol = "Subdocuments"
othcols = setdiff(names(data), dfcol)
subs = rbindlist(data[[dfcol]], id=TRUE)
subs[, (othcols) := data[.id, othcols, with=FALSE]]
如果您不想setDT(data)
,则可以更改最后一行data[.id, othcols]
。
答案 1 :(得分:0)
可能有帮助
library(data.table)
rbindlist(setNames(data[[3]], do.call(paste, data[1:2])), idcol=TRUE)[
, c("ID", "Country") := tstrsplit(.id, " ")][, .id := NULL][]
# Value Label Year ID Country
# 1: 5 Test1 2001 1 Netherlands
# 2: 5 Test2 2002 1 Netherlands
# 3: 1 Test3 2003 1 Netherlands
# 4: 3 Test4 2004 1 Netherlands
# 5: 2 Test5 2005 1 Netherlands
# 6: 1 Test6 2006 1 Netherlands
# 7: 1 Test7 2007 1 Netherlands
# 8: 1 Test8 2008 1 Netherlands
# 9: 2 Test9 2009 1 Netherlands
#10: 5 Test10 2010 1 Netherlands
#11: 3 Test11 2011 1 Netherlands
#12: 2 Test12 2012 1 Netherlands
#13: 4 Test13 2013 1 Netherlands
#14: 5 Test14 2014 1 Netherlands
#15: 5 Test15 2015 1 Netherlands
#16: 2 Test16 2016 1 Netherlands
#17: 5 Test1 2001 2 Germany
#18: 4 Test2 2002 2 Germany
#19: 3 Test3 2003 2 Germany
#20: 2 Test4 2004 2 Germany
#21: 2 Test5 2005 2 Germany
#22: 2 Test6 2006 2 Germany
#23: 1 Test7 2007 2 Germany
#24: 1 Test8 2008 2 Germany
#25: 5 Test9 2009 2 Germany
#26: 4 Test10 2010 2 Germany
#27: 4 Test11 2011 2 Germany
#28: 4 Test12 2012 2 Germany
#29: 5 Test13 2013 2 Germany
#30: 1 Test14 2014 2 Germany
#31: 1 Test15 2015 2 Germany
#32: 3 Test16 2016 2 Germany
#33: 1 Test1 2001 3 Belgium
#34: 2 Test2 2002 3 Belgium
#35: 3 Test3 2003 3 Belgium
#36: 1 Test4 2004 3 Belgium
#37: 1 Test5 2005 3 Belgium
#38: 4 Test6 2006 3 Belgium
#39: 5 Test7 2007 3 Belgium
#40: 1 Test8 2008 3 Belgium
#41: 2 Test9 2009 3 Belgium
#42: 3 Test10 2010 3 Belgium
#43: 2 Test11 2011 3 Belgium
#44: 2 Test12 2012 3 Belgium
#45: 1 Test13 2013 3 Belgium
#46: 1 Test14 2014 3 Belgium
#47: 1 Test15 2015 3 Belgium
#48: 5 Test16 2016 3 Belgium
注意:'数据'来自OP自己的帖子。
或使用dplyr
library(dplyr)
bind_rows(data[[3]], .id="ID") %>%
left_join(data[-3], ., by = "ID")