我想检查用户在我的丰富html编辑器中使用的HTML标记。我不确定如何在C#中做到这一点。
我应该使用正则表达式吗?我应该将哪些HTML标记列入黑名单/白名单?
答案 0 :(得分:1)
简单的白名单方法:
string input = "<span><b>99</b> < <i>100</i></span> <!-- 99 < 100 -->";
// escape & < and >
input = input.Replace("&", "&").Replace(">", ">").Replace("<", "<");
// unescape whitelisted tags
string output = input.Replace("<b>", "<b>").Replace("</b>", "</b>")
.Replace("<i>", "<i>").Replace("</i>", "</i>");
输出:
<span><b>99</b> < <i>100</i></span> <!-- 99 < 100 -->
渲染输出:
&lt; span&gt; 99 &lt; 100 的&LT; /跨度&GT; &lt;! - 99&lt; 100 - &gt;
答案 1 :(得分:0)
假设在StackOverflow上将标签作为单个字符串输入,您首先要将字符串拆分为单个标签:
string[] tags = "c# html lolcat ".Split(
new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
可以使用存储标记的HashSet<T>
来表示白名单/黑名单:
HashSet<string> blacklist = new HashSet<string>(
StringComparer.CurrentCultureIgnoreCase) { "lolcat", "lolrus" };
然后你必须检查列表中是否有一个tags
:
bool invalid = tags.Any(blacklist.Contains);
答案 2 :(得分:0)
您可以尝试Html Agility Pack。我没有尝试跳过标签,但它肯定能找到标签。
答案 3 :(得分:0)
string StringWhitelist(string StringToSanitize, string AllowedCharacters)
{
if (StringToSanitize.Length != 0 && AllowedCharacters.Length != 0)
{
List<char> UnsanitizedString = StringToSanitize.ToCharArray().ToList();
List<char> Whitelist = AllowedCharacters.ToCharArray().ToList();
string SanitizedString = StringToSanitize;
for (int i = 0; i < UnsanitizedString.Count; i++)
SanitizedString = Whitelist.IndexOf(UnsanitizedString[i]) == -1 ? SanitizedString.Replace(UnsanitizedString[i].ToString(), string.Empty) : SanitizedString;
return SanitizedString;
}
else
return null;
}
string StringBlacklist(string StringToSanitize, string NotAllowedCharacters)
{
if (StringToSanitize.Length != 0 && NotAllowedCharacters.Length != 0)
{
List<char> UnsanitizedString = StringToSanitize.ToCharArray().ToList();
List<char> Blacklist = NotAllowedCharacters.ToCharArray().ToList();
string SanitizedString = StringToSanitize;
for (int i = 0; i < UnsanitizedString.Count; i++)
SanitizedString = Blacklist.IndexOf(UnsanitizedString[i]) != -1 ? SanitizedString.Replace(UnsanitizedString[i].ToString(), string.Empty) : SanitizedString;
return SanitizedString;
}
else
return null;
}
用法:
StringWhitelist("Ciao", "abcdefghjklmnopqrstuvwxyz"); // Output: ao (because "C" and "i" are not in the whitelist)
StringBlacklist("Ciao", "Ci"); // Output: ao (because "C" and "i" are in the blacklist)