按特定标记拆分数据框中列的值,并将其添加为额外行

时间:2012-07-17 07:36:19

标签: string r split dataframe

假设我有以下数据框

> df <- data.frame(var1 = c("A", "B", "C", "D"),            
                 var2 = c("test", "5 | 6", "X & Y", "M | N | O"))
> df
  var1          var2
1    A          test
2    B         5 | 6
3    C         X & Y
4    D     M | N | O

如何通过var2|运算符拆分&中的值,并将它们作为单独的行放入same data.frame。输出应如下所示:

> df
  var1          var2
1    A          test
2    B             5
3    B             6
4    C             X
5    C             Y
6    D             M
7    D             N
8    D             O

我使用strsplit和for循环来实现它。但是,我认为这个编码不是很好。任何想法如何以更好的R方式实现这一目标?

2 个答案:

答案 0 :(得分:6)

你可以这样做:

s <- strsplit(df[,2], " \\| | & ")
cbind(var1=rep(df[,1], sapply(s, length)), var2=unlist(s))
     var1 var2  
[1,] "A"  "test"
[2,] "B"  "5"  
[3,] "B"  "6"  
[4,] "C"  "X"  
[5,] "C"  "Y"  
[6,] "D"  "M"  
[7,] "D"  "N" 
[8,] "D"  "O"  

答案 1 :(得分:1)

另一种方法是使用我的“splitstackshape”包中的cSplit

library(splitstackshape)
cSplit(df, "var2", "[|&]", "long", fixed = FALSE)[var2_new != ""]
#    var1 var2_new
# 1:    A     test
# 2:    B        5
# 3:    B        6
# 4:    C        X
# 5:    C        Y
# 6:    D        M
# 7:    D        N
# 8:    D        O