Question

我正在尝试转换包含具有特定降价约定的链接的一些文档（维基百科文章）。我希望在没有链接的情况下将这些呈现为对读者友好。惯例是：

应该捕获带有模式[[Article Name|Display Name]]的双括号中的名称，忽略管道和前面的文本以及括号括起来： Display Name。
模式[[Article Name]]的双括号中的名称应为在没有括号的情况下捕获：Article Name。

嵌套方法（产生所需结果）

我知道我可以在嵌套的re.sub()表达式中处理＃1和＃2。例如，这就是我想要的：

s = 'including the [[Royal Danish Academy of Sciences and Letters|Danish Academy of Sciences]], [[Norwegian Academy of Science and Letters|Norwegian Academy of Sciences]], [[Russian Academy of Sciences]], and [[National Academy of Sciences|US National Academy of Sciences]].'

re.sub('\[\[(.*?\|)(.*?)\]\]','\\2',         # case 1
       re.sub('\[\[([^|]+)\]\]','\\1',s)     # case 2
)
# result is correct:
'including the Danish Academy of Sciences, Norwegian Academy of Sciences, Russian Academy of Sciences, and US National Academy of Sciences.'

单程方法（在此寻找解决方案）

为了效率和我自己的改进，我想知道是否有单程方法。

我尝试过的内容：在可选的第1组中，我想贪婪地捕获[[和|之间的所有内容（如果存在）。然后在第2组中，我想捕获]]以外的所有内容。然后我想只返回第2组。

我的问题在于使贪婪捕获成为可选：

re.sub('\[\[([^|]*\|)?(.*?)\]\]','\\2',s)
# does NOT return the desired result:
'including the Danish Academy of Sciences, Norwegian Academy of Sciences, US National Academy of Sciences.'
# is missing: 'Russian Academy of Sciences, and '

Answer 1

See regex in use here

\[{2}(?:(?:(?!]{2})[^|])+\|)*((?:(?!]{2})[^|])+)]{2}

\[{2}匹配[[
(?:(?:(?!]{2})[^|])+\|)*符合以下任意次数
- (?:(?!]{2})[^|])+ Tempered greedy token匹配任何字符一次或多次，|或匹配]]的位置
- \|按字面意思匹配|
((?:(?!]{2})[^|])+)将以下内容捕获到捕获组1中
- (?:(?!]{2})[^|])+ Tempered greedy token匹配任何字符一次或多次，|或匹配]]的位置
]{2}匹配]]

替换\1

结果：

including the Danish Academy of Sciences, Norwegian Academy of Sciences, Russian Academy of Sciences, and US National Academy of Sciences.

可能为您工作的另一种选择如下。它没有上面的正则表达式那么具体，但不包括任何外观。

\[{2}(?:[^]|]+\|)*([^]|]+)]{2}

正则表达式捕获括号中的文本，省略可选前缀

嵌套方法（产生所需结果）

单程方法（在此寻找解决方案）

1 个答案: