使用Java Regex解析输入文本

时间:2014-07-06 14:19:06

标签: java regex

我有相应的输入文字:

    Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre       II]] movie directed [[Source:NYTimes]]...
    Clark visited the [[University of Pleasantville]] campus in November 2009 to ...
    *[[1973]] – [[Clark Kent]], superhero and newspaper reporter...
    After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''...
    Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]...</code>

这是我在Java中使用的模式代码:

    <code>String pattern = "(?:\\p{Punct}|\\B|\\b)(\\[\\[[^(Arch:|Zeus:|Source:)].*?\\]\\])(?:\\p{Punct}|\\b|\\B)"; 
    Pattern r = Pattern.compile(pattern); 
    Matcher m = r.matcher(data);
      while (m.find( )) {
        System.out.println("Found value: " + m.group(1) );
      }

我正在使用BufferedReader的readLine逐行读取文件(在我解析它时每行输出sysout)并使用我的正则表达式得到以下输出:
    Clark is set to work in ''[[Superman (the Hero)|Superman]]'', a [[SuperHero Genre II]] movie directed [[Source:NYTimes]]... Clark visited the [[University of Pleasantville]] campus in November 2009 to ... Found value: [[University of Pleasantville]] *[[1973]] &ndash; [[Clark Kent]], superhero and newspaper reporter... Found value: [[1973]] After appearing in other movies, Clark starred as [[negative hero]] [[Alternate Superman]] in ''[[Superman (2003 film)|Superman]]''... Found value: [[negative hero]] Found value: [[Alternate Superman]] Clark met ''[[Daily Planet]]'' reporter [[Louis Lane]]... Found value: [[Daily Planet]] Found value: [[Louis Lane]]

正如您所看到的那样:我无法提取大括号[[I_want_to_extract_these_except_Source_or_Arch_or_Zeus]]中的所有内容。示例:从第一行我应该提取[[超人(英雄)|超人]]等,但它没有检索任何东西。如何修改我的正则表达式以提取除[[Source:something]]等之外的所有内容?谢谢。

1 个答案:

答案 0 :(得分:1)

使用负面预测(例如(?!...)),如下所示:

\[\[(?!Arch:|Zeus:|Source).*?\]\]

查看实际操作:http://regex101.com/r/lJ6sH3/1