搜索和替换多个术语的最佳方法是什么?

时间:2017-06-10 00:24:20

标签: d

我实现了一个删除HTML实体的天真函数。但是这将对每个实体进行完整的字符串搜索。进行多字符串搜索和替换的最佳方法是什么?

string replace_entities(ref string x){
  return sanitize(x).replace("’","'").replace("‘","'").replace("'","'").replace("–","-").replace("—","-")
    .replace("“","\"").replace("”","\"").replace("”","\"").replace("'","'")
    .replace("&", "&").replace("&ndash","-").replace("&mdash","-").replace(""", "\"").strip();
}

1 个答案:

答案 0 :(得分:1)

您可以尝试使用Regex。 我做了一个关注性能的完整示例:)

import std.stdio : writeln;
import std.algorithm : reduce, find;
import std.regex : ctRegex, Captures, replaceAll;   

/*
Compile time conversion table:
["from", "to"]
*/
enum HTMLEntityTable = [
    ["’"  ,"'"  ],
    ["‘"  ,"'"  ],
    ["'"   ,"'"  ],
    ["–"  ,"-"  ],
    ["—"  ,"-"  ],
    ["“"  ,"\"" ],
    ["”"  ,"\"" ],
    ["”"  ,"\"" ],
    ["'"    ,"'"  ],
    ["&"    ,"&"  ],
    ["&ndash"   ,"-"  ],
    ["&mdash"   ,"-"  ],
    ["""   ,"\"" ]
];

/*
Compile time Regex String:
Use reduce to concatenate HTMLEntityTable on index 1 to form "’|‘|..."
*/
enum regex_replace = ctRegex!( 
    reduce!((a, b)=>a~"|"~b[0])(HTMLEntityTable[0][0],HTMLEntityTable[1..$]) 
);

/*
Replace Function:
Find matched string on HTMLEntityTable and replace it.
(Maybe I should use my HTMLEntityTable as a Associative Array
 but I think this way is faster ) 
*/
auto HTMLReplace(Captures!string str){      
    return HTMLEntityTable.find!(a=>a[0] == str.hit)[0][1];
}

//User Function.
auto replace_entities( ref string html ){   
    return replaceAll!HTMLReplace( html, regex_replace);
}

void main(){
    auto html = "Start’‘'–—“””'&&ndash&mdash"End";
    replace_entities( html ).writeln;
    //Output:
    //Start'''--"""'&--"End
}