我有一个看起来像这样的字符串:
s = "discount rates of 5% to 10%, and growth rates of 2% to 3%"
我想根据第一个范围后的字符分割字符串,因此在这种情况下,它将是“ 10%”后的逗号。输出看起来像这样
s = c("discount rates of 5% to 10%", " and growth rates of 2% to 3%")
我用来提取范围的正则表达式函数是:
(\\$*\\d*\\.\\d+[%x] (to|and) \\$*\\d*\\.\\d+[%x])
,到目前为止,它一直运行良好(某些范围以“ x”而不是“%”结尾),但不是在该正则表达式上拆分-我需要拆分紧随其后的字符。如果更简单,我也可以在最近的空间上分割,以便输出看起来像这样:
s = c("discount rates of 5% to 10%," "and growth rates of 2% to 3%")
我想在正则表达式之后的上进行拆分的原因是,我想保留两个匹配项(这里是“ 5到10%”和“ 2%到3” %“),但将它们放在不同的字符串中。
答案 0 :(得分:1)
这是怎么回事:
s1 <- "discount rates of 5% to 10%, and growth rates of 2% to 3%"
s2 <- "discount rates of 5% to 10x, and growth rates of 2% to 3%"
sub("\\s*,.*", "", s1) # first range
sub(sub("\\s*,.*", "", s1), "", s1) # second range
substring(sub(sub("\\s*,.*", "", s1), "", s1), 1, 1) # get first character in second range
### solution:
unlist(strsplit(s1, substring(sub(sub("\\s*,.*","", s1), "", s1), 1, 1))) # case 1
#[1] "discount rates of 5% to 10%" " and growth rates of 2% to 3%"
unlist(strsplit(s2, substring(sub(sub("\\s*,.*","", s2), "", s2), 1, 1))) # case 2
#[1] "discount rates of 5% to 10x" " and growth rates of 2% to 3%"
答案 1 :(得分:1)
我的解决方案可能是环形交叉路口,但可能就足够了:
ss<-gsub("(\\d+[%x],)", "\\1XX",s)
s<-unlist(strsplit(ss, split="XX"))
这假定“ XX”实际上并没有出现在任何地方,所以用一个不太可能的字符串替换它(我还简化了正则表达式,以假定将始终将数字后跟一个百分数或x,再加上一个逗号)上)。