我试图创建一个正则表达式来捕捉[[xyz | asd]],但不是[[xyz]] 在文中:
'''Diversity Day'''" is the second episode of the [[The Office (U.S. season 1)]|first season]] of the American [[comedy]] [[television program|television series]] ''[[The Office (U.S. TV series)|The Office]]'', and the show's second episode overall. Written by [[B. J. Novak]] and directed by [[Ken Kwapis]], it first aired in the United States on March 29, 2005, on [[NBC]]. The episode guest stars ''Office'' consulting producer [[Larry Wilmore]] as [[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]].
应捕获以下结果:
[[The Office (U.S. season 1)]|first season]] <-- keep in mind of the "]" before "|", "]" in that case is a literal character not a breaking one "]]"
[[television program|television series]]
[[The Office (U.S. TV series)|The Office]]
[[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]]
我试图使用的是:
\[\[([^|]+)\|([^|]+)\]\]
但我无法弄清楚如何忽略&#34; |&#34;和&#34;]]&#34;在小组中。 [^ |(]])]不会工作,因为它不匹配&#34;]]&#34;但只有角色&#34;]&#34; (它需要是整个词)
请帮助,谢谢!
答案 0 :(得分:6)
您可以在此处依赖tempered greedy token:
\[\[((?:(?!]]).)*)\|((?:(?!]]).)*)]]
请参阅regex demo
<强>详情:
\[\[
- 2 [
个符号((?:(?!]]).)*)
- 第1组(注意*
可以变成懒惰*?
,特别是如果第一部分比第二部分短,则捕获:
(?:(?!]]).)*
- 零个或多个序列
.
- 任何字符(但是换行符,如果您的字符串跨越多行,请使用带RegexOptions.Singleline
的模式)... (?!]])
- 尚未启动]]
序列(即,.
与]
不匹配,而]
跟随另一个\|
<) / LI>
|
- 文字((?:(?!]]).)*)
]]
- 第2组捕获与第2组相同的子模式]
- 2个文字\[\[([^]|]*(?:](?!])[^]|]*)*)\|([^]]*(?:](?!])[^]]*)*)]]
。这个正则表达式的效率更高的“展开”版本是:
|
请参阅regex demo。此正则表达式将第一个{{1}}视为内部字段分隔符。请参阅my other answer,了解如何展开驯服的贪婪代币。