第一

Question

我一直在尝试在notepad ++中使用正则表达式来自动化我需要对文档进行的大量更改，但我不认为我真的理解语法。

我有几个类似于以下内容的文本部分：

<a class='endnote' href='#cite1'><sup>[1]</sup></a>

数字是唯一的变量，我想将其更改为：

<ref name="cite1" />

和

    <div id='cite1'>
    <p class='cite'><sup>1</sup>a bunch of text</p>
    </div>

数字是唯一的变量，我想将其更改为：

 <ref name="cite1">a bunch of text</ref>

Answer 1

第一

可以使用此替换第一个字符串，该字符串验证锚标记有一个名为endnote的类，并提取不包含#的href值。

正则表达式：<a\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sclass=['"]endnote['"])(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\shref=['"]\#(cite[^'"]*)['"])(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?> .*?<\/a>

替换为：<ref name="$1" />

enter image description here

第二

可以使用此正则表达式替换第二个字符串

正则表达式：<div\b(?=\s|>)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?>.*?<p\b(?=\s)(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sclass=['"](cite)['"])(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*\s?><sup>([^<]*)<\/sup>(.*?)<\/p>.*?<\/div>

替换为：<ref name="$1$2">$3</ref>

enter image description here

Answer 2

现在你应该使用Parsoid将HTML转换回wikitext，而不是发明你自己的解析器（yet another one）。

使用正则表达式将HTML转换为wikitext

2 个答案:

第一

第二