如何从字符串中删除所有Wiki模板?

时间:2016-01-10 19:18:40

标签: javascript regex mediawiki-templates

我有维基百科文章的内容,有这样的东西:

{{Use mdy dates|date=June 2014}}
{{Infobox person
| name        = Richard Matthew Stallman
| image       = Richard Stallman - Fête de l'Humanité 2014 - 010.jpg
| caption     = Richard Stallman, 2014
| birth_date  = {{Birth date and age|1953|03|16}}
| birth_place = New York City
| nationality = American
| other_names = RMS, rms
| known_for   = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC
| alma_mater  = Harvard University,<br />Massachusetts Institute of Technology
| occupation  = President of the Free Software Foundation
| website     = {{URL|https://www.stallman.org/}}
| awards      =  MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards''
}}

{{Citation needed|date=May 2011}}

如何删除它?我可以使用这个正则表达式:/\{\{[^}]+\}\}/g但它不适用于像Infobox这样的嵌套模板

我尝试使用此代码首先删除嵌套模板,然后移除信息框,但我的结果有误。

&#13;
&#13;
var input = document.getElementById('input');
input.innerHTML = input.innerHTML.replace(/\{\{[^}]+\}\}/g, '');
&#13;
<pre id="input">    {{Use mdy dates|date=June 2014}}
    {{Infobox person
    | name        = Richard Matthew Stallman
    | image       =Richard Stallman - Fête de l'Humanité 2014 - 010.jpg
    | caption     = Richard Stallman, 2014
    | birth_date  = {{Birth date and age|1953|03|16}}
    | birth_place = New York City
    | nationality = American
    | other_names = RMS, rms
    | known_for   = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC
    | alma_mater  = Harvard University,<br />Massachusetts Institute of Technology
    | occupation  = President of the Free Software Foundation
    | website     = {{URL|https://www.stallman.org/}}
    | awards      =  MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards''
    }}</pre>
&#13;
&#13;
&#13;

1 个答案:

答案 0 :(得分:3)

Javascript正则表达式没有匹配嵌套括号的功能(如递归或平衡组)。使用正则表达式的方法包括使用找到最里面括号的模式多次处理字符串,直到无法替换:

do {
    var cnt=0;
    txt = txt.replace(/{{[^{}]*(?:{(?!{)[^{}]*|}(?!})[^{}]*)*}}/g, function (_) {
        cnt++; return '';
    });
} while (cnt);

模式细节:

{{
[^{}]* # all that is not a bracket
(?: # this group is only useful if you need to allow single brackets
    {(?!{)[^{}]* # an opening bracket not followed by an other opening bracket
  |   # OR
    }(?!})[^{}]* # same thing for closing brackets
)*
}}

如果您不想多次处理字符串,您还可以在找到括号时逐字符地读取字符串字符增加和减少标记。

另一种使用split和Array.prototype.reduce

的方式
var stk = 0;
var result = txt.split(/({{|}})/).reduce(function(c, v) {
    if (v == '{{') { stk++; return c; }
    if (v == '}}') { stk = stk ? stk-1 : 0; return c; }
    return stk ? c : c + v;
});