Question

我有一个看起来像这样的变量：

Var
[1] 3, 4, 5     2, 4, 5     2, 4     1, 4, 5

我需要将其拆分为数据框，如下所示：

V1   V2   V3   V4   V5
NA   NA   3    4    5
NA   2    NA   4    5
NA   2    NA   4    NA
1    NA   NA   4    5

不幸的是，我找不到能解决我问题的帖子。有谁知道我怎么能这样做？非常感谢你提前！

编辑：我找到了一个基于您的答案的解决方案并将其发布在下面。

Edit2：我使用Ananda的解决方案提高了代码的效率。

Answer 1

使用矩阵索引：

Var <- list(c(3,4,5),c(2,4,5),c(2,4),c(1,4,5))
unVar <- unlist(Var)
out <- matrix(NA, nrow=length(Var), ncol=max(unVar))

out[cbind(rep(seq_along(Var),sapply(Var,length)),unVar)] <- unVar
# and if you're using the new version of R, you can simplify a little:
out[cbind(rep(seq_along(Var),lengths(Var)),unVar)] <- unVar

#     [,1] [,2] [,3] [,4] [,5]
#[1,]   NA   NA    3    4    5
#[2,]   NA    2   NA    4    5
#[3,]   NA    2   NA    4   NA
#[4,]    1   NA   NA    4    5

Answer 2

 Var <- list(c(3, 4, 5), c(2, 4, 5), c(2, 4), c(1, 4, 5))
 M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max)))
 for( L in seq(Var) ) { M [ cbind( rep( L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]}
 M
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA    3    4    5
[2,]   NA    2   NA    4    5
[3,]   NA    2   NA    4   NA
[4,]    1   NA   NA    4    5

就个人而言，我的投票建议是thelatemail的版本，它基本上与此同构。

Answer 3

根据OP的回答判断，＆＃34; var＆＃34;是一个字符串，如：

var <- c("3, 4, 5", "2, 4, 5", "2, 4", "1, 4, 5")

如果是这种情况，您可以从我的＆＃34; splitstackshape＆＃34;中考虑cSplit_e。包：

library(splitstackshape)
cSplit_e(data.frame(var), "var", ",", mode = "value", drop = TRUE)
#   var_1 var_2 var_3 var_4 var_5
# 1    NA    NA     3     4     5
# 2    NA     2    NA     4     5
# 3    NA     2    NA     4    NA
# 4     1    NA    NA     4     5

如果它是list，正如其他答案所假设的那样，您可以在＆＃34; splitstackshape＆＃34;中使用（未导出的）numMat函数。权力cSplit_e。

var <- list(c(3,4,5), c(2,4,5), c(2,4), c(1,4,5))
splitstackshape:::numMat(var, mode = "value")
#       1  2  3 4  5
# [1,] NA NA  3 4  5
# [2,] NA  2 NA 4  5
# [3,] NA  2 NA 4 NA
# [4,]  1 NA NA 4  5

在幕后，numMat与@ thelatemail的回答非常相似。

如果你有-99代表NA并想要排除它们，你可以尝试：

var <- c("3, 4, 5", "2, -99, 4, 5", "2, 4", "1, 4, 5, -99")
splitstackshape:::numMat(
  lapply(strsplit(var, ","), function(x) as.numeric(x)[as.numeric(x) > 0]), 
  mode = "value")
#       1  2  3 4  5
# [1,] NA NA  3 4  5
# [2,] NA  2 NA 4  5
# [3,] NA  2 NA 4 NA
# [4,]  1 NA NA 4  5

Answer 4

如果Var只是一个向量，那么我会执行以下操作：

Var = c(3,4,5,2,4,5,2,4,1,4,5)
RowIdx = c(rep(1,3),rep(2,3),rep(3,2),rep(4,3))
DF = matrix(NA,nrow=4,ncol=5)

for (idx in 1:length(Var)){
  DF[RowIdx[idx],Var[idx]] = Var[idx]
}

当然，如果你有更多的数据，你可能想找到一种以更自动的方式生成行索引的方法

Answer 5

我设法根据您的回复找到了解决方案！我的最终解决方案如下：

# I had the additional problem that my variable was a factor, therefore I had to transform it first.
df <- data.frame(Var)
Var <- lapply(strsplit(as.character(df$Var), ", "), "[")
for(i in 1:length(Var)){
  Var[[i]] <- as.numeric(Var[[i]]) 
}

# Then I created a matrix based on thelatemails and BondedDusts approach.
M <- matrix(NA, nrow=length(Var), ncol=max(sapply(Var,max)))

# Additionally, I had the problem that there were some lines with a single -99, which indicates a missing value for the complete line. I had some problems with this negative value. For this reason, I assigned NA's first.
for(i in 1:length(Var)){
  Var[[i]][Var[[i]] == -99] <- NA
}

# Final assignment like suggested by BonedDust.
for( L in seq(Var) ) { M [ cbind( rep( L, length(Var[[L]])), Var[[L]]) ] <- Var[[L]]}
M

我不确定这是否是最快的解决方案，但现在一切正常！非常感谢您快速而广泛的答案！

拆分变量并在

5 个答案: