假设我有几行维基百科XML,如下所示:
[[图片来源:ChicagoAnarchists.jpg | thumb |一个有同情心的雕刻 [[沃尔特克兰]]后执行的“芝加哥无政府主义者” [[Haymarket事务]]。 Haymarket事件通常被认为是 国际[[五一]]起源最重要的事件 [纪念] 1907年,[国际无政府主义者大会] 阿姆斯特丹]]聚集了来自14个不同国家的代表 无政府主义运动的哪些重要人物,包括[[Errico 马拉泰斯塔]
我想删除以[[Image:" and closed by "observances]]
开头的行。
可能还有其他几行文本也有括号,我不想做贪婪的搜索,否则也可能意外删除其他括号。
例如,如果我只是做了一个贪婪的\\[\\[Image:.*\\]\\]
,我相信它会删除最后一个右括号(Ericco Malatesta)的所有内容
是否有正则表达式可以让我更容易?
答案 0 :(得分:2)
让我们看看...使用懒惰重复而不是贪婪怎么样?
\[\[Image:.*?observances\]\]
答案 1 :(得分:0)
这个例子怎么了?
s.replaceAll("(\\[{2}Image:(?:(?:\\[{2}).*\\]{2}|[^\\[])*\\]{2})", "");
仅替换此文字:
[[Image:ChicagoAnarchists.jpg|thumb|A sympathetic engraving by [[Walter Crane]] of the executed "Anarchists of Chicago" after the [[Haymarket affair]]. The Haymarket affair is generally considered the most significant event for the origin of international [[May Day]] observances]]
答案 2 :(得分:0)
这有效:
str.replaceAll("^\\[\\[([^\\[]*?(\\[\\[[^\\]]*\\]\\])?[^\\[]*?)*?\\]\\]\\s*", "");
输入输出:
In 1907, the [[International...
这是有效的,因为它正在寻找匹配的[[
和]]
(以及周围的文字)里面的第一对。
答案 3 :(得分:0)
也许是这样的:
(.*?\\[\\[[^\\[]*?\\]\\][^\\[]*\\]\\])
我试过
public class My {
public static void main(String[] args) {
String foo = "[[Image:ChicagoAnarchists.jpg|thumb|A sympathetic engraving by [[Walter Crane]] of the executed \"Anarchists of Chicago\" after the [[Haymarket affair]]. The Haymarket affair is generally considered the most significant event for the origin of international [[May Day]] observances]] In 1907, the [[International Anarchist Congress of Amsterdam]] gathered delegates from 14 different countries, among which important figures of the anarchist movement, including [[Errico Malatesta]]";
Matcher m = Pattern.compile("(.*?\\[\\[[^\\[]*?\\]\\][^\\[]*\\]\\])").matcher(foo);
while (m.find()) {
System.out.print(m.group(1));
}
}}
打印
[[Image:ChicagoAnarchists.jpg|thumb|A sympathetic engraving by [[Walter Crane]] of the executed "Anarchists of Chicago" after the [[Haymarket affair]]. The Haymarket affair is generally considered the most significant event for the origin of international [[May Day]] observances]]
希望这会有所帮助:D
答案 4 :(得分:0)
使用以下测试字符串(注意,我在其中添加了一个[[image:foobar[[foo [baz] bar]]foobar]]
):
[[Image:ChicagoAnarchists.jpg|thumb|A sympathetic engraving by [[Walter Crane]] of the executed \"Anarchists of Chicago\" after the [[Haymarket affair]]. The Haymarket affair is generally considered the most significant event for the origin of international [[May Day]] observances]] In 1907, the [[International Anarchist Congress of[[image:foobar[[foo [baz] bar]]foobar]] Amsterdam]] gathered delegates from 14 different countries, among which important figures of the anarchist movement, including [[Errico Malatesta]]
正则表达式:
(?i)\\[\\[image:(?:\\[\\[(?:(?!(?:\\[\\[|]])).)*]]|(?:(?!(?:\\[\\[|]])).)*?)*?]]
testString.replaceAll(<above pattern>, "")
将返回:
In 1907, the [[International Anarchist Congress of Amsterdam]] gathered delegates from 14 different countries, among which important figures of the anarchist movement, including [[Errico Malatesta]]
以下是正则表达式的更详细说明:
(?i) # Case insensitive flag
\[\[image: # Match literal characters '[[image:'
(?: # Begin non-capturing group
\[\[ # Match literal characters '[['
(?: # Begin non-capturing group
(?! # Begin non-capturing negative look-ahead group
(?: # Begin non-capturing group
\[\[ # Match literal characters '[['
| # Match previous atom or next atom
]] # Match literal characters ']]'
) # End non-capturing group
) # End non-capturing negative look-ahead group
. # Match any character
) # End non-capturing group
* # Match previous atom zero or more times
]] # Match literal characters ']]'
| # Match previous atom or next atom
(?: # Begin non-capturing group
(?! # Begin non-capturing negative look-ahead group
(?: # Begin non-capturing group
\[\[ # Match literal characters '[['
| # Match previous atom or next atom
]] # Match literal characters ']]'
) # End non-capturing group
) # End non-capturing negative look-ahead group
. # Match any character
) # End non-capturing group
*? # Reluctantly match previous atom zero or more times
) # End non-capturing group
*? # Reluctantly match previous atom zero or more times
]] # Match literal characters ']]'
这只会处理一级嵌套[[...]]
模式。正如this answer至this question TJR所述,正则表达式不会处理无限制的嵌套原子。因此,此正则表达式模式与[[foo[[baz]]bar]]
字符串中的[[image:...]]
不匹配。
要获得精彩的正则表达式参考,请参阅Regular-Expressions.info。