Question

我试图创建一个正则表达式来捕捉[[xyz | asd]]，但不是[[xyz]] 在文中：

'''Diversity Day'''" is the second episode of the [[The Office (U.S. season 1)]|first season]] of the American [[comedy]] [[television program|television series]] ''[[The Office (U.S. TV series)|The Office]]'', and the show's second episode overall. Written by [[B. J. Novak]] and directed by [[Ken Kwapis]], it first aired in the United States on March 29, 2005, on [[NBC]]. The episode guest stars ''Office'' consulting producer [[Larry Wilmore]] as [[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]].

应捕获以下结果：

[[The Office (U.S. season 1)]|first season]] <-- keep in mind of the "]" before "|", "]" in that case is a literal character not a breaking one "]]"
[[television program|television series]]
[[The Office (U.S. TV series)|The Office]]
[[List_of_characters_from_The_Office_(US)#Mr._Brown|Mr. Brown]]

我试图使用的是：

\[\[([^|]+)\|([^|]+)\]\]

但我无法弄清楚如何忽略＆＃34; |＆＃34;和＆＃34;]]＆＃34;在小组中。 [^ |（]]）]不会工作，因为它不匹配＆＃34;]]＆＃34;但只有角色＆＃34;]＆＃34; （它需要是整个词）

请帮助，谢谢！

Answer 1

您可以在此处依赖tempered greedy token：

\[\[((?:(?!]]).)*)\|((?:(?!]]).)*)]]

请参阅regex demo

<强>详情：

\[\[ - 2 [个符号
((?:(?!]]).)*) - 第1组（注意*可以变成懒惰*?，特别是如果第一部分比第二部分短，则捕获：
- (?:(?!]]).)* - 零个或多个序列
  - . - 任何字符（但是换行符，如果您的字符串跨越多行，请使用带RegexOptions.Singleline的模式）...
  - (?!]]) - 尚未启动]]序列（即，.与]不匹配，而]跟随另一个\| <） / LI>
| - 文字((?:(?!]]).)*)
]] - 第2组捕获与第2组相同的子模式
] - 2个文字\[\[([^]|]*(?:](?!])[^]|]*)*)\|([^]]*(?:](?!])[^]]*)*)]]。

这个正则表达式的效率更高的“展开”版本是：

请参阅regex demo。此正则表达式将第一个{{1}}视为内部字段分隔符。请参阅my other answer，了解如何展开驯服的贪婪代币。

正则表达式除了一些之外的任何字符

1 个答案: