使用colsplit函数用于许多列

时间:2014-10-19 14:03:55

标签: r

我有一个矩阵500行,1000列。每个col都有逗号之间的4个元素,我需要删除逗号。

数据看起来就是这样。

     1            2          3          4    ...  1000
1  12,1,20   14,15,12    10,10,20    1,0,10 ... 1,5,3
2  12,1,20   14,15,12    10,10,20    1,0,10 ... 1,5,3
3  12,1,20   14,15,12    10,10,20    1,0,10 ... 1,5,3
.
.
500  12,1,20   14,15,12    10,10,20    1,0,10 ... 1,5,3

我的代码是

mat=matrix(data=NA, nrow=257, ncol=3)
n=1000
k=500
for(i in 1:n){
mat[i]<-colsplit(as.character(data[,i]), "," , c("a","b","c")) 
}

不工作,我的循环中缺少。 谁能帮我解决一下,谢谢

1 个答案:

答案 0 :(得分:2)

如果要基于,作为分隔符

创建新列
library(data.table)
library(splitstackshape)

df1 <- cSplit(df, 1:ncol(df), sep=",")[,lapply(.SD, as.numeric)]
df1
#    X1_1 X1_2 X1_3 X2_1 X2_2 X2_3 X3_1 X3_2 X3_3 X4_1 X4_2 X4_3
#1:   12    1   20   14   15   12   10   10   20    1    0   10
#2:   12    1   20   14   15   12   10   10   20    1    0   10
#3:   12    1   20   14   15   12   10   10   20    1    0   10

或者使用cSplit_f对矩形数据更快(基于splitstackshape包的作者的评论(@Ananda Mahto)

 cSplit_f(df, 1:ncol(df), sep=",")[,lapply(.SD, as.numeric)]

str(df1)
#   Classes ‘data.table’ and 'data.frame':  3 obs. of  12 variables:
#  $ X1_1: num  12 12 12
#  $ X1_2: num  1 1 1
#  $ X1_3: num  20 20 20
#  $ X2_1: num  14 14 14
#  $ X2_2: num  15 15 15
#  $ X2_3: num  12 12 12
#  $ X3_1: num  10 10 10
#  $ X3_2: num  10 10 10
#  $ X3_3: num  20 20 20
#  $ X4_1: num  1 1 1
#  $ X4_2: num  0 0 0
#  $ X4_3: num  10 10 10

数据

df <- structure(list(X1 = c("12,1,20", "12,1,20", "12,1,20"), X2 = c("14,15,12", 
 "14,15,12", "14,15,12"), X3 = c("10,10,20", "10,10,20", "10,10,20"
 ), X4 = c("1,0,10", "1,0,10", "1,0,10")), .Names = c("X1", "X2", 
 "X3", "X4"), class = "data.frame", row.names = c("1", "2", "3"
))