Question

我有相应的输入文字：

    Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre       II]] movie directed [[Source:NYTimes]]...
    Clark visited the [[University of Pleasantville]] campus in November 2009 to ...
    *[[1973]] &amp;ndash; [[Clark Kent]], superhero and newspaper reporter...
    After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''...
    Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...</code>

这是我在Java中使用的模式代码：

    <code>String pattern = "(?:\\p{Punct}|\\B|\\b)(\\[\\[[^(Arch:|Zeus:|Source:)].*?\\]\\])(?:\\p{Punct}|\\b|\\B)"; 
    Pattern r = Pattern.compile(pattern); 
    Matcher m = r.matcher(data);
      while (m.find( )) {
        System.out.println("Found value: " + m.group(1) );
      }

我正在使用BufferedReader的readLine逐行读取文件（在我解析它时每行输出sysout）并使用我的正则表达式得到以下输出：
Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]... Clark visited the [[University of Pleasantville]] campus in November 2009 to ... Found value: [[University of Pleasantville]] *[[1973]] – [[Clark Kent]], superhero and newspaper reporter... Found value: [[1973]] After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... Found value: [[negative hero]] Found value: [[Alternate Superman]] Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]... Found value: [[Daily Planet]] Found value: [[Louis Lane]]

正如您所看到的那样：我无法提取大括号[[I_want_to_extract_these_except_Source_or_Arch_or_Zeus]]中的所有内容。示例：从第一行我应该提取[[超人（英雄）|超人]]等，但它没有检索任何东西。如何修改我的正则表达式以提取除[[Source：something]]等之外的所有内容？谢谢。

Answer 1

使用负面预测（例如(?!...)），如下所示：

\[\[(?!Arch:|Zeus:|Source).*?\]\]

查看实际操作：http://regex101.com/r/lJ6sH3/1

使用Java Regex解析输入文本

1 个答案: