ASP.net解析html以确保安全。这个可以吗?

时间:2012-04-03 10:47:40

标签: c# asp.net .net html xss

我确信这已被问了很多次,但我找不到符合我想要的东西。 我希望能够在我的网页中安全地呈现html,但只允许链接,和

标签

我想出了以下内容,但我想确保我没有做任何事情,或者如果有更好的方法请告诉我。

代码:

    private string RemoveEvilTags(string value)
    {
        string[] allowed = { "<br/>", "<p>", "</p>", "</a>", "<a href" };
        string anchorPattern = @"<a[\s]+[^>]*?href[\s]?=[\s\""\']+(?<href>.*?)[\""\']+.*?>(?<fileName>[^<]+|.*?‌​)?<\/a>";            
        string safeText = value;

        System.Text.RegularExpressions.MatchCollection matches = Regex.Matches(value, anchorPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
        if (matches.Count > 0)
        {
            foreach (Match m in matches)
            {
                string url = m.Groups["href"].Value;
                string linkText = m.Groups["fileName"].Value;                    

                Uri testUri = null;
                if (Uri.TryCreate(url, UriKind.Absolute, out testUri) && testUri.AbsoluteUri.StartsWith("http"))
                {
                    safeText = safeText.Replace(m.Groups[0].Value, string.Format("<a href=\"{0}\" >{1}</a>", testUri.AbsoluteUri, linkText));
                }
                else
                {
                    safeText = safeText.Replace(m.Groups[0].Value, linkText);
                }
            }
        }

        //Remove everything.
        safeText = System.Text.RegularExpressions.Regex.Replace(safeText, @"<[a-zA-Z\/][^>]*>", m => m != null && allowed.Contains(m.Value) || m.Value.StartsWith("<a href") ? m.Value : String.Empty);

        //Now add them back in.
        return safeText;
    }

试验:

    [Test]
    public void EvilTagTest()
    {
        var safeText = RemoveEvilTags("this is a test <p>ok</p>");
        Assert.AreEqual("this is a test <p>ok</p>", safeText);

        safeText = RemoveEvilTags("this is a test <script>ok</script>");
        Assert.AreEqual("this is a test ok", safeText);

        safeText = RemoveEvilTags("this is a test <script><script>ok</script></script>");
        Assert.AreEqual("this is a test ok", safeText);

        //Check relitive link
        safeText = RemoveEvilTags("this is a test <a href=\"bob\" >click here</a>");
        Assert.AreEqual("this is a test click here", safeText);

        //Check full link
        safeText = RemoveEvilTags("this is a test <a href=\"http://test.com/\" >click here</a>");
        Assert.AreEqual("this is a test <a href=\"http://test.com/\" >click here</a>", safeText);

        //Check full link
        safeText = RemoveEvilTags("this is a test <a href=\"https://test.com/\" >click here</a>");
        Assert.AreEqual("this is a test <a href=\"https://test.com/\" >click here</a>", safeText);

        //javascript link
        safeText = RemoveEvilTags("this is a test <a href=\"javascript:evil()\" >click here</a>");
        Assert.AreEqual("this is a test click here", safeText);

        safeText = RemoveEvilTags("this is a test <a href=\"https://test.com/\" ><script>evil();</script>click here</a>");
        Assert.AreEqual("this is a test <a href=\"https://test.com/\" >click here</a>", safeText);
    }

所有测试都通过但我错过了什么?

谢谢。

2 个答案:

答案 0 :(得分:2)

为了获得最佳实践,您不应将自己的库设置为“RemoveEvilTags”。恶意用户可以使用大量方法来执行XSS攻击。 ASP.NET已经提供了一个Anti XSS库:

http://msdn.microsoft.com/en-us/library/aa973813.aspx

由于您使用的是ASP.NET,因此Plural Sight在XSS上有一个很好的视频。更专注于MVC,但它在这种情况下仍然有效。

http://www.pluralsight-training.net/microsoft/players/PSODPlayer?author=scott-allen&name=mvc3-building-security&mode=live&clip=0&course=aspdotnet-mvc3-intro

答案 1 :(得分:0)

我建议您使用一些html解析器,例如Html Agility Pack,而不是编写这样的代码。

您的代码解析代码可能遇到很多未处理的极端情况 - 希望解析器可以处理大多数情况。解析后,您可以拒绝无效输入或仅允许有效标签(根据您的需要)。