Question

我基本上试图用大文本替换所有脚注。我在Objective-C中有很多原因，所以请假设这个约束。

每个脚注都有这个：[脚注

每个脚注都以此结尾：]

这两个标记之间绝对可以有任何内容，包括换行符。但是，它们之间永远不会存在。

所以，基本上我想匹配[脚注，然后匹配除了]以外的任何内容，直到匹配为止。

这是我能够识别所有脚注的最接近的地方：

NSString *regexString = @"[\\[][F][o][o][t][n][o][t][e][^\\]\n]*[\\]]";

使用此正则表达式可以识别780/889脚注。看来这些780都不是误报。它似乎唯一缺少的是那些在其中有换行符的脚注。

我花了很长时间在www.regular-expressions.info上，特别是关于点的网页（http://www.regular-expressions.info/dot.html）。这有助于我创建上面的正则表达式，但我还没有真正想出如何包含任何字符或换行符，除了右括号。

使用以下正则表达式设法捕获所有脚注，但它捕获的文本太多，因为*是贪婪的：(?s)[\\[][F][o][o][t][n][o][t][e].*[\\]]

以下是运行正则表达式的一些示例文本：

  <p id="id00082">[Footnote 1: In the history of Florence in the early part of the XVIth century <i>Piero di Braccio Martelli</i> is frequently mentioned as <i>Commissario della Signoria</i>. He was famous for his learning and at his death left four books on Mathematics ready for the press; comp. LITTA, <i>Famiglie celebri Italiane</i>, <i>Famiglia Martelli di Firenze</i>.—In the Official Catalogue of MSS. in the Brit. Mus., New Series Vol. I., where this passage is printed, <i>Barto</i> has been wrongly given for Braccio.</p>

  <p id="id00083">2. <i>addi 22 di marzo 1508</i>. The Christian era was computed in Florence at that time from the Incarnation (Lady day, March 25th). Hence this should be 1509 by our reckoning.</p>

  <p id="id00084">3. <i>racolto tratto di molte carte le quali io ho qui copiate</i>. We must suppose that Leonardo means that he has copied out his own MSS. and not those of others. The first thirteen leaves of the MS. in the Brit. Mus. are a fair copy of some notes on physics.]</p>

  <p id="id00085">Suggestions for the arrangement of MSS treating of particular subjects.(5-8).</p>

When you put together the science of the motions of water, remember to include under each proposition its application and use, in order that this science may not be useless.--

[Footnote 2: A comparatively small portion of Leonardo's notes on water-power was published at Bologna in 1828, under the title: "_Del moto e misura dell'Acqua, di L. da Vinci_".]

在此示例中，有两个脚注和一些非脚注文本。如您所见，第一个脚注中包含两个换行符。第二个不包含换行符。

我在上面提到的第一个正则表达式将设法捕获此示例文本中的脚注2，但它不会捕获脚注1，因为它包含换行符。

对我的正则表达式的任何改进都将非常感激。

Answer 1

尝试

@"\\[Footnote[^\\]]*\\]";

这应该跨越换行符。无需将单个字符放入字符类中。

作为一个注释的多行正则表达式（没有字符串转义）：

\[        # match a literal [
Footnote  # match literal "Footnote"
[^\]]*    # match zero or more characters except ]
\]        # match ]

在角色类（[...]）中，插入符^具有不同的含义;它否定了班级的内容。因此[ab]匹配a或b，而[^ab]匹配除a或b之外的任何字符。

当然，如果您有嵌套的脚注，这将会出现故障。像[Footnote foo [footnote bar] foo]这样的文字将从开头到bar]匹配。要避免这种情况，请将正则表达式更改为

@"\\[Footnote[^\\]\\[]*\\]";

因此不允许打开或关闭括号。然后，当然，您只匹配最里面的脚注，并且必须将相同的正则表达式应用于整个文本两次（或更多，取决于最大嵌套级别），逐层“剥离”。

RegexKitLite：匹配表达式 - ＆gt;匹配除了以外的任何内容 - ＆gt;比赛 ]

1 个答案: