Question

我有一个看起来像这样的字符串：

s = "discount rates of 5% to 10%, and growth rates of 2% to 3%"

我想根据第一个范围后的字符分割字符串，因此在这种情况下，它将是“ 10％”后的逗号。输出看起来像这样

s = c("discount rates of 5% to 10%", " and growth rates of 2% to 3%")

我用来提取范围的正则表达式函数是：

(\\$*\\d*\\.\\d+[%x] (to|and) \\$*\\d*\\.\\d+[%x])

，到目前为止，它一直运行良好（某些范围以“ x”而不是“％”结尾），但不是在该正则表达式上拆分-我需要拆分紧随其后的字符。如果更简单，我也可以在最近的空间上分割，以便输出看起来像这样：

s = c("discount rates of 5% to 10%," "and growth rates of 2% to 3%")

我想在正则表达式之后的上进行拆分的原因是，我想保留两个匹配项（这里是“ 5到10％”和“ 2％到3” ％“），但将它们放在不同的字符串中。

Answer 1

这是怎么回事：

s1 <- "discount rates of 5% to 10%, and growth rates of 2% to 3%"
s2 <- "discount rates of 5% to 10x, and growth rates of 2% to 3%"
sub("\\s*,.*", "", s1) # first range
sub(sub("\\s*,.*", "", s1), "", s1) # second range
substring(sub(sub("\\s*,.*", "", s1), "", s1), 1, 1) # get first character in second range
### solution:
unlist(strsplit(s1, substring(sub(sub("\\s*,.*","", s1), "", s1), 1, 1))) # case 1
#[1] "discount rates of 5% to 10%"   " and growth rates of 2% to 3%"
unlist(strsplit(s2, substring(sub(sub("\\s*,.*","", s2), "", s2), 1, 1))) # case 2
#[1] "discount rates of 5% to 10x"   " and growth rates of 2% to 3%"

Answer 2

我的解决方案可能是环形交叉路口，但可能就足够了：

ss<-gsub("(\\d+[%x],)", "\\1XX",s)
s<-unlist(strsplit(ss, split="XX"))

这假定“ XX”实际上并没有出现在任何地方，所以用一个不太可能的字符串替换它（我还简化了正则表达式，以假定将始终将数字后跟一个百分数或x，再加上一个逗号）上）。

正则表达式后分割特定字符

2 个答案: