正则表达式提取不包含在双引号中的字符串

时间:2015-04-08 17:01:09

标签: regex r pcre

例如,我需要在双引号之外得到所有内容:

This is a string outside quotes, and "these words are in quotes" which I want to ignore.

结果应为:

This is a string outside quotes, and  which I want to ignore.

经过多次搜索后,我发现非常类似: http://www.rubular.com/r/kxm0cEx8gD

但它并没有给我预期的结果。

到目前为止我设法实现的目标是:

(.?(?!["]))((?<!["]).?)

    (.?(?!["])) - negative lookahead - I expect to give me all symbols before the ["]

    ((?<!["]).?) - negative lookbehind - I expect to give all the symbols not preceded by ["]

我使用支持perl语法的R和PCRE 8.0

3 个答案:

答案 0 :(得分:3)

你可以尝试

sub('"[^"]*"', '', str1)
#[1] "This is a string outside quotes, and  which I want to ignore."

注意:如果有多个实例,请使用gsub代替sub

gsub('"[^"]*"', '', str2)
#[1] "This is a string outside quotes, and  which I want to ignore. and  thank you"

数据

 str1 <- 'This is a string outside quotes, and "these words are in quotes" which I want to ignore.'

 str2 <- 'This is a string outside quotes, and "these words are in quotes" which I want to ignore. and "these words" thank you'

答案 1 :(得分:2)

您可以使用s/"[^"]*"//g删除字符串的引用部分。或者,如果您不想修改原始字符串,则可以使用自Perl 5版本14以来可用的非破坏性修饰符/r

use strict;
use warnings;
use 5.014;

my $ss = 'This is a string outside quotes, and "these words are in quotes" which I want to ignore.';

say $ss =~ s/"[^"]*"//gr;

<强>输出

This is a string outside quotes, and  which I want to ignore.

答案 2 :(得分:1)

我维护的 qdapRegex 包中的rm_between函数是解决左右边界之间删除或提取内容的问题的一般解决方案:

x <- c(
    'This is a string outside quotes, and "these words are in quotes" which I want to ignore.',
    'A second sentence "delete me" and also "delete me"'
)

library(qdapRegex)
rm_between(x, "\"", "\"")

## [1] "This is a string outside quotes, and which I want to ignore."
## [2] "A second sentence and also"

查看使用的正则表达式:

S("@rm_between", "\"")
## [1] "(\")(.*?)(\")"