我有一个大的htmlencoded字符串,我想只解码特定的白名单html标签。
有没有办法在c#中执行此操作,WebUtility.HtmlDecode()解码所有内容。
`我正在寻找将通过以下测试的DecodeSpecificTags()的实现。
[Test]
public void DecodeSpecificTags_SimpleInput_True()
{
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
List<string> whiteList = new List<string>(){ "strong","br" } ;
Assert.IsTrue(DecodeSpecificTags(whiteList,input) == output);
}`
答案 0 :(得分:1)
更好的方法可能是使用一些像Agilitypack或csquery或Nsoup这样的html解析器来查找特定元素并在循环中解码它。
check this for links and examples of parsers
检查它,我是用csquery做的:
string input = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
string output = "<span>i am <strong color=blue>very</strong> big <br>man.</span>";
var decoded = HttpUtility.HtmlDecode(output);
var encoded =input ; // HttpUtility.HtmlEncode(decoded);
Console.WriteLine(encoded);
Console.WriteLine(decoded);
var doc=CsQuery.CQ.CreateDocument(decoded);
var paras=doc.Select("strong").Union(doc.Select ("br")) ;
var tags=new List<KeyValuePair<string, string>>();
var counter=0;
foreach (var element in paras)
{
HttpUtility.HtmlEncode(element.OuterHTML).Dump();
var key ="---" + counter + "---";
var value= HttpUtility.HtmlDecode(element.OuterHTML);
var pair= new KeyValuePair<String,String>(key,value);
element.OuterHTML = key ;
tags.Add(pair);
counter++;
}
var finalstring= HttpUtility.HtmlEncode(doc.Document.Body.InnerHTML);
finalstring.Dump();
foreach (var element in tags)
{
finalstring=finalstring.Replace(element.Key,element.Value);
}
Console.WriteLine(finalstring);
答案 1 :(得分:1)
你可以做这样的事情
public string DecodeSpecificTags(List<string> whiteListedTagNames,string encodedInput)
{
String regex="";
foreach(string s in whiteListedTagNames)
{
regex="<"+@"\s*/?\s*"+s+".*?"+">";
encodedInput=Regex.Replace(encodedInput,regex);
}
return encodedInput;
}
答案 2 :(得分:0)
或者您可以根据您的要求将HtmlAgility与黑名单或白名单一起使用。我正在使用黑名单方法。 我列入黑名单的标签存储在文本文件中,例如“script | img”
public static string DecodeSpecificTags(this string content, List<string> blackListedTags)
{
if (string.IsNullOrEmpty(content))
{
return content;
}
blackListedTags = blackListedTags.Select(t => t.ToLowerInvariant()).ToList();
var decodedContent = HttpUtility.HtmlDecode(content);
var document = new HtmlDocument();
document.LoadHtml(decodedContent);
decodedContent = blackListedTags.Select(blackListedTag => document.DocumentNode.Descendants(blackListedTag))
.Aggregate(decodedContent,
(current1, nodes) =>
nodes.Select(htmlNode => htmlNode.WriteTo())
.Aggregate(current1,
(current, nodeContent) =>
current.Replace(nodeContent, HttpUtility.HtmlEncode(nodeContent))));
return decodedContent;
}