我实现了一个删除HTML实体的天真函数。但是这将对每个实体进行完整的字符串搜索。进行多字符串搜索和替换的最佳方法是什么?
string replace_entities(ref string x){
return sanitize(x).replace("’","'").replace("‘","'").replace("'","'").replace("–","-").replace("—","-")
.replace("“","\"").replace("”","\"").replace("”","\"").replace("'","'")
.replace("&", "&").replace("&ndash","-").replace("&mdash","-").replace(""", "\"").strip();
}
答案 0 :(得分:1)
您可以尝试使用Regex。 我做了一个关注性能的完整示例:)
import std.stdio : writeln;
import std.algorithm : reduce, find;
import std.regex : ctRegex, Captures, replaceAll;
/*
Compile time conversion table:
["from", "to"]
*/
enum HTMLEntityTable = [
["’" ,"'" ],
["‘" ,"'" ],
["'" ,"'" ],
["–" ,"-" ],
["—" ,"-" ],
["“" ,"\"" ],
["”" ,"\"" ],
["”" ,"\"" ],
["'" ,"'" ],
["&" ,"&" ],
["&ndash" ,"-" ],
["&mdash" ,"-" ],
[""" ,"\"" ]
];
/*
Compile time Regex String:
Use reduce to concatenate HTMLEntityTable on index 1 to form "’|‘|..."
*/
enum regex_replace = ctRegex!(
reduce!((a, b)=>a~"|"~b[0])(HTMLEntityTable[0][0],HTMLEntityTable[1..$])
);
/*
Replace Function:
Find matched string on HTMLEntityTable and replace it.
(Maybe I should use my HTMLEntityTable as a Associative Array
but I think this way is faster )
*/
auto HTMLReplace(Captures!string str){
return HTMLEntityTable.find!(a=>a[0] == str.hit)[0][1];
}
//User Function.
auto replace_entities( ref string html ){
return replaceAll!HTMLReplace( html, regex_replace);
}
void main(){
auto html = "Start’‘'–—“””'&&ndash&mdash"End";
replace_entities( html ).writeln;
//Output:
//Start'''--"""'&--"End
}