正则表达式后分割特定字符

时间:2018-09-11 14:13:50

标签: r regex string split

我有一个看起来像这样的字符串:

s = "discount rates of 5% to 10%, and growth rates of 2% to 3%"

我想根据第一个范围后的字符分割字符串,因此在这种情况下,它将是“ 10%”后的逗号。输出看起来像这样

s = c("discount rates of 5% to 10%", " and growth rates of 2% to 3%")

我用来提取范围的正则表达式函数是:

(\\$*\\d*\\.\\d+[%x] (to|and) \\$*\\d*\\.\\d+[%x])

,到目前为止,它一直运行良好(某些范围以“ x”而不是“%”结尾),但不是在该正则表达式上拆分-我需要拆分紧随其后的字符。如果更简单,我也可以在最近的空间上分割,以便输出看起来像这样:

s = c("discount rates of 5% to 10%," "and growth rates of 2% to 3%")

我想在正则表达式之后的上进行拆分的原因是,我想保留两个匹配项(这里是“ 5到10%”和“ 2%到3” %“),但将它们放在不同的字符串中。

2 个答案:

答案 0 :(得分:1)

这是怎么回事:

s1 <- "discount rates of 5% to 10%, and growth rates of 2% to 3%"
s2 <- "discount rates of 5% to 10x, and growth rates of 2% to 3%"
sub("\\s*,.*", "", s1) # first range
sub(sub("\\s*,.*", "", s1), "", s1) # second range
substring(sub(sub("\\s*,.*", "", s1), "", s1), 1, 1) # get first character in second range
### solution:
unlist(strsplit(s1, substring(sub(sub("\\s*,.*","", s1), "", s1), 1, 1))) # case 1
#[1] "discount rates of 5% to 10%"   " and growth rates of 2% to 3%"
unlist(strsplit(s2, substring(sub(sub("\\s*,.*","", s2), "", s2), 1, 1))) # case 2
#[1] "discount rates of 5% to 10x"   " and growth rates of 2% to 3%"

答案 1 :(得分:1)

我的解决方案可能是环形交叉路口,但可能就足够了:

ss<-gsub("(\\d+[%x],)", "\\1XX",s)
s<-unlist(strsplit(ss, split="XX"))

这假定“ XX”实际上并没有出现在任何地方,所以用一个不太可能的字符串替换它(我还简化了正则表达式,以假定将始终将数字后跟一个百分数或x,再加上一个逗号)上)。