Question

我们有一个从网页上读取的字符串。因为浏览器可以容忍未编码的特殊字符（例如＆符号），所以某些页面使用它编码，有些页面不...因此存在很大的可能性，我们存储了一些编码一次的数据，有的多次... ...

是否有一些明确的解决方案，如何确定，无论编码多少次，我的字符串都被解码了？

以下是我们现在使用的内容：

public static string HtmlDecode(this string input)
{
     var temp = HttpUtility.HtmlDecode(input);
     while (temp != input)
     {
         input = temp;
         temp = HttpUtility.HtmlDecode(input);
     }
     return input;
}

与UrlDecode一样使用。

Answer 1

这可能是最好的方法。真正的解决方案是重新编写代码，以便您只对所有地方的内容进行单独编码，这样您就只能单独解码它们。

Answer 2

您的代码似乎正在通过多次检查正确解码html字符串。

但是，如果输入HTML格式错误，即编码不正确，则解码将是意外的。即无论输入多少次，不良输入都可能无法正确解码。

使用两个编码字符串进行快速检查，一个使用完全编码的字符串，另一个使用部分编码，产生以下结果。

"<b>"将解码为"<b>"

"<b&gt将解码为"<b&gt"

Answer 3

如果这对任何人都有用，那么这里是多个HTML编码字符串的递归版本（我觉得它更容易阅读）：

public static string HtmlDecode(string input) {
    string decodedInput = WebUtility.HtmlDecode(input);

    if (input == decodedInput) {
        return input;
    }

    return HtmlDecode(decodedInput);
}

WebUtility在System.Net名称空间中。

HTML / Url解码多次编码的字符串

3 个答案: