如何删除两个外部字符之间的所有内容?

时间:2014-03-03 10:48:57

标签: java regex wikipedia

我有以下部分的字符串:

{{Infobox musical artist
|honorific-prefix  = [[The Honourable]]
| name = Bob Marley
| image = Bob-Marley.jpg
| alt = Black and white image of Bob Marley on stage with a guitar
| caption = Bob Marley in concert, 1980.
| background = solo_singer
| birth_name = Robert Nesta Marley
| alias = Tuff Gong
| birth_date = {{birth date|df=yes|1945|2|6}}
| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]
| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}
| death_place = [[Miami]], [[Florida]]
| instrument = Vocals, guitar, percussion
| genre = [[Reggae]], [[ska]], [[rocksteady]]
| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] 
| years_active = 1962–1981
| label = [[Beverley's]], [[Studio One (record label)|Studio One]],
| associated_acts = [[Bob Marley and the Wailers]]
| website = {{URL|bobmarley.com}}
}}

我想删除所有内容。现在,如果我尝试使用正则表达式:\{\{(.*?)\}\}它会捕获{{birth date|df=yes|1945|2|6}},这是有道理的,所以我尝试了:\{\{([^\}]*?)\}\} thens从开始抓取但是在同一行结束,这也是有道理的它鼓舞了}},我也试过没有?贪婪,但结果仍然相同。我的问题是,我怎样才能删除{{}}内的所有内容,无论内部有多少相同的字符?

修改:如果您想要我的全部输入,就是这样: https://en.wikipedia.org/w/index.php?maxlag=5&title=Bob+Marley&action=raw

4 个答案:

答案 0 :(得分:1)

这是一个带有DOTALL Pattern和贪婪量词的解决方案,用于输入,只包含要删除的片段的一个实例(即替换为空String):

String input = "Foo {{Infobox musical artist\n"
                + "|honorific-prefix  = [[The Honourable]]\n"
                + "| name = Bob Marley\n"
                + "| image = Bob-Marley.jpg\n"
                + "| alt = Black and white image of Bob Marley on stage with a guitar\n"
                + "| caption = Bob Marley in concert, 1980.\n"
                + "| background = solo_singer\n"
                + "| birth_name = Robert Nesta Marley\n"
                + "| alias = Tuff Gong\n"
                + "| birth_date = {{birth date|df=yes|1945|2|6}}\n"
                + "| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]\n"
                + "| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}\n"
                + "| death_place = [[Miami]], [[Florida]]\n"
                + "| instrument = Vocals, guitar, percussion\n"
                + "| genre = [[Reggae]], [[ska]], [[rocksteady]]\n"
                + "| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] \n"
                + "| years_active = 1962–1981\n"
                + "| label = [[Beverley's]], [[Studio One (record label)|Studio One]],\n"
                + "| associated_acts = [[Bob Marley and the Wailers]]\n"
                + "| website = {{URL|bobmarley.com}}\n" + "}} Bar";
//                                    |DOTALL flag
//                                    |  |first two curly brackets
//                                    |  |     |multi-line dot
//                                    |  |     | |last two curly brackets
//                                    |  |     | |        | replace with empty
System.out.println(input.replaceAll("(?s)\\{\\{.+\\}\\}", ""));

<强>输出

Foo  Bar

评论后的注释

这种情况意味着使用正则表达式来操作标记语言。

正则表达式不用于解析分层标记实体,并且在这种情况下不会起作用,所以这个答案只是一个存根,在这种情况下最好是一个丑陋的解决方法。

有关使用正则表达式解析标记的着名SO线程,请参阅here

答案 1 :(得分:0)

使用贪婪的量词而不是你不情愿的量词。

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

编辑:勺子喂食:“\ {\ {。* \} \}”

答案 2 :(得分:0)

尝试这种模式,它应该照顾好一切:

"\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D"

指定: DOTALL

代码:

String result = searchText.replaceAll("\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D", "");

示例:http://fiddle.re/5n4zg

答案 3 :(得分:0)

此正则表达式匹配单个此类块(仅):

\{\{([^{}]*?\{\{.*?\}\})*.*?\}\}

查看live demo

在java中,删除所有这些块:

str = str.replaceAll("(?s)\\{\\{([^{}]*?\\{\\{.*?\\}\\})*.*?\\}\\}", "");