用多个定界符分割列,保留定界符

时间:2018-12-11 00:12:01

标签: r regex data.table strsplit

如何使用%,-和+作为可能的分隔符将字符列分为3列,并将分隔符保留在新列中?

示例数据:

data <- data.table(x=c("92.1%+100-200","90.4%-1000+200", "92.8%-200+100", "99.2%-500-200","90.1%+500-200"))

所需数据示例:

data.desired <- data.table(x1=c("92.1%", "90.4%", "92.8%","99.2%","90.1%")
                           , x2=c("+100","-1000","-200","-500","+500")
                           , x3=c("-200","+200","+100","-200","-200"))

很高兴为这些问题奖励分数,并为此提供了一些帮助!

3 个答案:

答案 0 :(得分:3)

我们可以使用function splicedArray(arr1, arr2, n) { let arr = arr2.slice(); return arr.splice(n, 0, ...arr1); } // returns [] instead of [4, 1, 2, 3, 5] splicedArray([1, 2, 3], [4, 5], 1); 中的separate进行拆分,并使用正向超前来保留定界符:

tidyr

也就是说,请注意,只要按data %>% separate(x, c("x1", "x2", "x3"), sep = "(?=\\+|-)") # x1 x2 x3 # 1: 92.1% +100 -200 # 2: 90.4% -1000 +200 # 3: 92.8% -200 +100 # 4: 99.2% -500 -200 # 5: 90.1% +500 -200 进行拆分,我们就可以得到

\\+|-

如果data %>% separate(x, c("x1", "x2", "x3"), sep = "\\+|-") # x1 x2 x3 # 1: 92.1% 100 200 # 2: 90.4% 1000 200 # 3: 92.8% 200 100 # 4: 99.2% 500 200 # 5: 90.1% 500 200 (?=\\+|-)(不匹配)之后立即使用+拆分为“ nothing”。

答案 1 :(得分:2)

data.table中,等效值为tstrsplit

data[, c("x1","x2","x3") := tstrsplit(x, "(?<=.)(?=[+-])", perl=TRUE) ]
data
#                x    x1    x2   x3
#1:  92.1%+100-200 92.1%  +100 -200
#2: 90.4%-1000+200 90.4% -1000 +200
#3:  92.8%-200+100 92.8%  -200 +100
#4:  99.2%-500-200 99.2%  -500 -200
#5:  90.1%+500-200 90.1%  +500 -200

答案 2 :(得分:2)

这里是使用base R

的选项
cbind(data, read.csv(text = gsub("(?=[+-])", ",", data$x, perl = TRUE), 
    header = FALSE, stringsAsFactors = FALSE, col.names = c('x1', 'x2', 'x3')))
#                x    x1    x2   x3
#1:  92.1%+100-200 92.1%   100 -200
#2: 90.4%-1000+200 90.4% -1000  200
#3:  92.8%-200+100 92.8%  -200  100
#4:  99.2%-500-200 99.2%  -500 -200
#5:  90.1%+500-200 90.1%   500 -200