列拆分不重复

时间:2012-03-30 13:45:22

标签: r

我有一个包含一列的数据框,我希望将其拆分为多个列,但拆分的数量在整个行中都是动态的。

Var1
====
A/B
A/B/C
C/B
A/C/D/E

我尝试使用colsplit(df$Var1,split="/",names=c("Var1","Var2","Var3","Var4")),但少于4个变量的行会重复。

从Hansi,所需的输出将是:

     Var1 Var2 Var3 Var4
[1,] "A"  "B"  NA   NA  
[2,] "A"  "B"  "C"  NA  
[3,] "C"  "B"  NA   NA  
[4,] "A"  "C"  "D"  "E" 

3 个答案:

答案 0 :(得分:2)

> read.table(text=as.character(df$Var1), sep="/", fill=TRUE)
  V1 V2 V3 V4
1  A  B      
2  A  B  C   
3  C  B      
4  A  C  D  E

可以使用colClasses="character"

保留仅数字字段中的前导零
a <- data.frame(Var1=c("01/B","04/B/C","0098/B","8708/C/D/E"))
read.table(text=as.character(a$Var1), sep="/", fill=TRUE, colClasses="character")
    V1 V2 V3 V4
1   01  B      
2   04  B  C   
3 0098  B      
4 8708  C  D  E

答案 1 :(得分:1)

如果我在这里正确理解你的目标是一个可能的解决方案,我确信有更好的方法可以做到这一点,但这是第一个想到的:

a <- data.frame(Var1=c("A/B","A/B/C","C/B","A/C/D/E"))
splitNames <- c("Var1","Var2","Var3","Var4")

# R> a
     # Var1
# 1     A/B
# 2   A/B/C
# 3     C/B
# 4 A/C/D/E

b <- t(apply(a,1,function(x){
    temp <- unlist(strsplit(x,"/"));
    return(c(temp,rep(NA,max(0,length(splitNames)-length(temp)))))
}))
colnames(b) <- splitNames

# R> b
     # Var1 Var2 Var3 Var4
# [1,] "A"  "B"  NA   NA  
# [2,] "A"  "B"  "C"  NA  
# [3,] "C"  "B"  NA   NA  
# [4,] "A"  "C"  "D"  "E" 

答案 2 :(得分:0)

我不知道解决问题的功能,但您可以使用标准R命令轻松实现:

# Here are your data
df <- data.frame(Var1=c("A/B", "A/B/C", "C/B", "A/C/D/E"), stringsAsFactors=FALSE)

# Split
rows <- strsplit(df$Var1, split="/")

# Maximum amount of columns
columnCount <- max(sapply(rows, length))

# Fill with NA
rows <- lapply(rows, `length<-`, columnCount)

# Coerce to data.frame
out <- as.data.frame(rows)

# Transpose
out <- t(out)

由于它依赖于strsplit,您可能需要进行一些类型转换。见type.con