这是另一个问题 Extracting from Nested list to data frame
使用更新的答案,我将获得我的数据框。
然后我使用df <- data.frame(start = df3[5,])
所以我离开了:
dput(df)
structure(list(start.X1_1 = structure(4L, .Names = "experience.start", .Label = c("",
" ", "1", "2015"), class = "factor"), start.X2_2 = structure(3L, .Names = "experience.start", .Label = c(" ",
"1", "2011"), class = "factor"), start.X3_2 = structure(3L, .Names = "experience.start", .Label = c(" ",
"1", "2007"), class = "factor"), start.X4_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ",
"1"), class = "factor"), start.X5_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ",
"1"), class = "factor"), start.X6_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ",
"1"), class = "factor"), start.X7_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ",
"1"), class = "factor"), start.X8_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ",
"1"), class = "factor"), start.X9_2 = structure(NA_integer_, .Names = "experience.start", .Label = c(" ",
"1"), class = "factor"), start.X10_3 = structure(3L, .Names = "experience.start", .Label = c(" ",
"1", "2016", "3000"), class = "factor"), start.X11_3 = structure(3L, .Names = "experience.start", .Label = c(" ",
"1", "2015", "3000"), class = "factor"), start.X12_3 = structure(4L, .Names = "experience.start", .Label = c("",
" ", "1", "2015", "2016", "EE"), class = "factor"), start.X13_3 = structure(4L, .Names = "experience.start", .Label = c("",
" ", "1", "2014", "2015"), class = "factor"), start.X14_3 = structure(3L, .Names = "experience.start", .Label = c(" ",
"1", "2013", "2014"), class = "factor"), start.X15_3 = structure(3L, .Names = "experience.start", .Label = c(" ",
"1", "2010", "2011", "Virtusa"), class = "factor")), .Names = c("start.X1_1",
"start.X2_2", "start.X3_2", "start.X4_2", "start.X5_2", "start.X6_2",
"start.X7_2", "start.X8_2", "start.X9_2", "start.X10_3", "start.X11_3",
"start.X12_3", "start.X13_3", "start.X14_3", "start.X15_3"), row.names = "experience.start", class = "data.frame")
现在我想要采用以下格式:
v1 v2 v3 v4 v5 v6 v7 v8
1 2015
2 2011 2007 null null null null null null
3 2016 2015 2015 2015 2013 2010
我可以使用以下内容查找匹配
的列sR <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))}
sR(names(df),2)
[1] "_1" "_2" "_2" "_2" "_2" "_2" "_2" "_2" "_2" "_3" "_3" "_3" "_3" "_3" "_3"
所以我想从这里必须有一种方法可以达到我想要的输出。
或者我确定有人会告诉我更好的方式
答案 0 :(得分:2)
主要思想是根据下划线后面的后缀split
数据框。通过这种方式,您可以获得包含3个元素的列表,每个后缀为1个(在您的情况下为1
,2
,3
)
df[] <- lapply(df[], as.character)
l1 <- lapply(split(stack(df), as.numeric(sub('.*_', '', stack(df)[,2]))), '[', 1)
lapply(l1, head, 2)
#$`1`
# values
#1 2015
#$`2`
# values
#2 2011
#3 2007
#$`3`
# values
#10 2016
#11 2015
现在我们需要做的就是将cbind
这3个元素放在一起,这有点棘手,因为它们的长度不同。幸运的是,我们可以使用SO(我们可以使用下面的免责声明)来解决这个问题。
t(do.call(cbindPad, l1))
# 1 2 3 4 5 6 7 8
#values "2015" NA NA NA NA NA NA NA
#values "2011" "2007" NA NA NA NA NA NA
#values "2016" "2015" "2015" "2014" "2013" "2010" NA NA
<强>声明强>
函数cbindPad
取自@ Joran在this post
或者,rbind.fill
包中的函数plyr
可以在转置后使用,以提供cbind.fill
种结果。
plyr::rbind.fill(lapply(l1, function(i) as.data.frame(t(i))))
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#1 2015 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#2 <NA> 2011 2007 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2016 2015 2015 2014 2013 2010