Java Pattern.compile忽略转义双引号(\“)

时间:2015-07-08 12:44:14

标签: java regex escaping quotes

我很难找出忽略转义引号的模式。 我想要这个:

    "10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only.","blah blah" 

匹配为:

   1> "10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only."
   2> "blah blah" 

我一直在尝试这个:

    Pattern pattern = Pattern.compile("\"[^\"]*\"");
    Matcher matcher = pattern.matcher(filteredCoupons);

我得到了这个

   1> "10\"
   2> "," 

3 个答案:

答案 0 :(得分:2)

你正在寻找的正则表达式是

"[^"\\]*(?:\\.[^"\\]*)*"

请参阅demo

在Java中,

String pattern = "\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"";

答案 1 :(得分:0)

您的正则表达式似乎需要接受非引号或引号Pattern pattern = Pattern.compile("\"(\\\\.|[^\"])*\""); 之前的引号。在这种情况下尝试

\\\\.|[^\"]

这部分正则表达式\.将尝试查找

  • | - 任何转义字符,
  • [^\"]或)\. - 任何非引用字符

我在[^\"]之前放置了\,以阻止[^\"]foo\"bar"匹配。

换句话说,对于像\\\\.|[^\"]和正则表达式foo\"bar" ^^^-matched by [^\"] foo\"bar" ^^-matched by \. foo\"bar" ^^^-matched by [^\"] foo\"bar" ^-can't be matched by anything since there is no \ before nor it is non-quote 这样的文字,您将获得此匹配

String filteredCoupons = "\"10\\\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only.\",\"blah blah\"";
Pattern pattern = Pattern.compile("\"(\\\\.|[^\"])*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
while(matcher.find()){
    System.out.println(matcher.group());
}

样本:

"10\" 2 Topping Pizza, Pasta, or Sandwich for $5 each. Valid until 2pm. Carryout only."
"blah blah"

输出:

File folder = new File(System.getProperty("user.dir")+"/src/test/resources/");
File[] files = folder.listFiles();

答案 2 :(得分:0)

也可以使用否定lookbehind

(?s)".*?"(?<!\\.)

作为Java字符串:

"(?s)\".*?\"(?<!\\\\.)"

test at regex101; test at regexplanet(点击&#34; Java&#34;)

  • 在遇到"之后,如果没有前面的反斜杠跳过一个字符
  • 类似".*?(?<!\\)",但在遇到"
  • 之后会有更好的表现
  • 使用(?s)标记使点也匹配换行符

为了兴趣,我在regexhero.net处使用示例字符串对不同版本进行了基准测试(感谢@stribizhev获取此链接!)。不确定regex101的步骤计数器在这里是否准确。

enter image description here

仅用于基准测试的非捕获组。有趣的是,"(?:\\.|[^"])*"与捕获组"(\\.|[^"])*"相比几乎翻了一倍。