我正在从data.frame“转换”到data.table
我现在有一个data.table:
library(data.table)
DT = data.table(ID = c("ab_cd.de","ab_ci.de","fb_cd.de","xy_cd.de"))
DT
ID
1: ab_cd.de
2: ab_ci.de
3: fb_cd.de
4: xy_cd.de
new_DT<- data.table(matrix(ncol = 2))
colnames(new_DT)<- c("test1", "test2")
我想首先:在每个条目之后删除“.de”,然后在下一步中用下划线分隔每个条目,并将输出保存在两个新列中。最终输出应如下所示:
test1 test2
1 ab cd
2 ab ci
3 fb cd
4 xy cd
在data.frame中我做了:
df = data.frame(ID = c("ab_cd.de","ab_ci.de","fb_cd.de","xy_cd.de"))
df
ID
1: ab_cd.de
2: ab_ci.de
3: fb_cd.de
4: xy_cd.de
df[,1] <- gsub(".de", "", df[,1], fixed=FALSE)
df
ID
1: ab_cd
2: ab_ci
3: fb_cd
4: xy_cd
n <- 1
for (i in (1:length(df[,1]))){
new_df[n,] <-str_split_fixed(df[i,1], "_", 2)
n <- n+1
}
new_df
test1 test2
1 ab cd
2 ab ci
3 fb cd
4 xy cd
感谢任何帮助!
答案 0 :(得分:2)
使用tstrsplit
删除后缀( .de )后,您可以使用sub
将列拆分为两个:
DT[, c("test1", "test2") := tstrsplit(sub("\\.de", "", ID), "_")][, ID := NULL][]
# test1 test2
#1: ab cd
#2: ab ci
#3: fb cd
#4: xy cd
答案 1 :(得分:1)
我们可以使用extract
tidyr
library(tidyr)
df %>%
extract(ID, into = c('test1', 'test2'), '([^_]+)_([^.]+).*')
# test1 test2
#1 ab cd
#2 ab ci
#3 fb cd
#4 xy cd
或使用data.table
library(data.table)
DT[, .(test1 = sub('_.*', '', ID), test2 = sub('[^_]+_([^.]+)\\..*', '\\1', ID))]
# test1 test2
#1: ab cd
#2: ab ci
#3: fb cd
#4: xy cd