例如,我需要在双引号之外得到所有内容:
This is a string outside quotes, and "these words are in quotes" which I want to ignore.
结果应为:
This is a string outside quotes, and which I want to ignore.
经过多次搜索后,我发现非常类似: http://www.rubular.com/r/kxm0cEx8gD
但它并没有给我预期的结果。
到目前为止我设法实现的目标是:
(.?(?!["]))((?<!["]).?)
(.?(?!["])) - negative lookahead - I expect to give me all symbols before the ["]
((?<!["]).?) - negative lookbehind - I expect to give all the symbols not preceded by ["]
我使用支持perl语法的R和PCRE 8.0
答案 0 :(得分:3)
你可以尝试
sub('"[^"]*"', '', str1)
#[1] "This is a string outside quotes, and which I want to ignore."
注意:如果有多个实例,请使用gsub
代替sub
gsub('"[^"]*"', '', str2)
#[1] "This is a string outside quotes, and which I want to ignore. and thank you"
str1 <- 'This is a string outside quotes, and "these words are in quotes" which I want to ignore.'
str2 <- 'This is a string outside quotes, and "these words are in quotes" which I want to ignore. and "these words" thank you'
答案 1 :(得分:2)
您可以使用s/"[^"]*"//g
删除字符串的引用部分。或者,如果您不想修改原始字符串,则可以使用自Perl 5版本14以来可用的非破坏性修饰符/r
use strict;
use warnings;
use 5.014;
my $ss = 'This is a string outside quotes, and "these words are in quotes" which I want to ignore.';
say $ss =~ s/"[^"]*"//gr;
<强>输出强>
This is a string outside quotes, and which I want to ignore.
答案 2 :(得分:1)
我维护的 qdapRegex 包中的rm_between
函数是解决左右边界之间删除或提取内容的问题的一般解决方案:
x <- c(
'This is a string outside quotes, and "these words are in quotes" which I want to ignore.',
'A second sentence "delete me" and also "delete me"'
)
library(qdapRegex)
rm_between(x, "\"", "\"")
## [1] "This is a string outside quotes, and which I want to ignore."
## [2] "A second sentence and also"
查看使用的正则表达式:
S("@rm_between", "\"")
## [1] "(\")(.*?)(\")"