R:gsub和str_split_fixed在data.tables中

时间:2017-05-27 13:18:57

标签: r data.table gsub strsplit

我正在从data.frame“转换”到data.table

我现在有一个data.table:

library(data.table)


DT = data.table(ID = c("ab_cd.de","ab_ci.de","fb_cd.de","xy_cd.de"))
DT

         ID
1: ab_cd.de
2: ab_ci.de
3: fb_cd.de
4: xy_cd.de  

new_DT<- data.table(matrix(ncol = 2))
colnames(new_DT)<- c("test1", "test2")

我想首先:在每个条目之后删除“.de”,然后在下一步中用下划线分隔每个条目,并将输出保存在两个新列中。最终输出应如下所示:

   test1 test2
1    ab    cd
2    ab    ci
3    fb    cd
4    xy    cd

在data.frame中我做了:

df = data.frame(ID = c("ab_cd.de","ab_ci.de","fb_cd.de","xy_cd.de"))
df

         ID
1: ab_cd.de
2: ab_ci.de
3: fb_cd.de
4: xy_cd.de


df[,1] <- gsub(".de", "", df[,1], fixed=FALSE)
df

      ID
1: ab_cd
2: ab_ci
3: fb_cd
4: xy_cd



 n <- 1
for (i in (1:length(df[,1]))){
    new_df[n,] <-str_split_fixed(df[i,1], "_", 2)
    n <- n+1
}
new_df

  test1 test2
1    ab    cd
2    ab    ci
3    fb    cd
4    xy    cd

感谢任何帮助!

2 个答案:

答案 0 :(得分:2)

使用tstrsplit删除后缀( .de )后,您可以使用sub将列拆分为两个:

DT[, c("test1", "test2") := tstrsplit(sub("\\.de", "", ID), "_")][, ID := NULL][]

#   test1 test2
#1:    ab    cd
#2:    ab    ci
#3:    fb    cd
#4:    xy    cd

答案 1 :(得分:1)

我们可以使用extract

中的tidyr
library(tidyr)
df %>% 
   extract(ID, into = c('test1', 'test2'), '([^_]+)_([^.]+).*')
#  test1 test2
#1    ab    cd
#2    ab    ci
#3    fb    cd
#4    xy    cd

或使用data.table

library(data.table)
DT[, .(test1 = sub('_.*', '', ID), test2 = sub('[^_]+_([^.]+)\\..*', '\\1', ID))]
#   test1 test2
#1:    ab    cd
#2:    ab    ci
#3:    fb    cd
#4:    xy    cd