动态分配data.table tstrsplit中的拆分数

时间:2015-10-18 16:15:28

标签: r split data.table

data.table v.1.9.6中,您可以在列中拆分变量,如下所示:

library(data.table)
DT = data.table(x=c("A/B", "A", "B"), y=1:3)
DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][]

预先不知道所需的分割数量[上面:2]。 当已知拆分数时,如何生成所需的变量名?

n = 2  # desired number of splits
# naive attempt to build required string
m = paste0("'", "myvar", 1:n, "'", collapse = ",")
m = paste0("c(", m, ")" )

# [1] "c('myvar1','myvar2','myvar3')"


DT[, m := tstrsplit(x, "/", fixed=TRUE)][]  # doesn't work

2 个答案:

答案 0 :(得分:5)

两种方法。第一个强烈建议:

#one
n=2
DT[, paste0("myvar", 1:n) := tstrsplit(x, "/", fixed=T)][]
#     x y myvar1 myvar2
#1: A/B 1      A      B
#2:   A 2      A     NA
#3:   B 3      B     NA

#two
DT[, eval(parse(text=m)) := tstrsplit(x, "/", fixed=TRUE)][]
#     x y myvar1 myvar2
#1: A/B 1      A      B
#2:   A 2      A     NA
#3:   B 3      B     NA 

<强>额外

如果您事先不知道分割数量:

splits <- max(lengths(strsplit(DT$x, "/")))
DT[, paste0("myvar", 1:splits) := tstrsplit(x, "/", fixed=T)][]

答案 1 :(得分:1)

另一种简单的方法。您可以将拆分的字符串堆叠在一列中,而不是创建额外的列:

DT = data.table(x=c("A/B", "A", "B"), y=1:3)

DT1 <- DT[, .(new=tstrsplit(x, "/",fixed=T)), by=y]
DT1

#    y new
# 1: 1   A
# 2: 1   B
# 3: 2   A
# 4: 3   B