使用linq验证List中是否存在URL

时间:2015-04-03 18:41:25

标签: c# linq

我有一个对象列表,我需要验证以检查列表中是否存在给定的URL。认为LINQ将是一个很好的方法,但我不太清楚如何去做。

var url1 = new WhiteListItem() {Url = "*.aaaaa.com/*"};
var url2 = new WhiteListItem() { Url = "www.bbbbb.com/*" };
var url3 = new WhiteListItem() { Url = "www.ccccc.com" };
var url4 = new WhiteListItem() { Url = "www.ddddd.com/ddddddd" };

var validUrls = new List<WhiteListItem> {url1, url2, url3, url4};

为了澄清,我正在尝试为给定的网址获取以下结果:

  1. True - www.aaaaa.com/something?aaa=something/something
  2. True - mobi.aaaaa.com/Something
  3. 错误 - aaaaa.com (因为不存在子域名)
  4. True - www.bbbbb.com/something/something
  5. True - www.bbbbb.com
  6. 错误 - mobi.bbbbb.com (因为只允许使用www子域名)
  7. 我认为你能得到这张照片。请帮助或指出我正确的方向。代码示例将受到高度赞赏。


    @stovroz,感谢回到我身边。我想我应该这样做,这是我的功能:如果你看到任何漏洞,请告诉我。不确定使用stringbuilder是否过度杀伤?

    然后最后一个问题我怎么能说“/”可以在最后出现但不允许通过。

    private static Regex CreateRegularExpression(string urlString)
    {
        var sb = new StringBuilder(urlString.Trim());
    
        sb.Replace(".", @"\.");
        if (sb.ToString().EndsWith(@"/"))
        {
            sb.Append("?");
        }
    
        if (sb.ToString().EndsWith(@"/*"))
        {
            sb.Insert(sb.Length - 1, '.');
        }
    
        if (sb.ToString().IndexOf("https://", StringComparison.Ordinal) >= 0)
        {
            sb.Replace("https://", @"\bhttps://");
        }
        else if (sb.ToString().IndexOf("http://", StringComparison.Ordinal) >= 0)
        {
            sb.Replace("http://", @"\bhttp://");
        }
        else
        {
            sb = new StringBuilder(Config.AllowedProtocolRegExp + sb.ToString());
        }
    
        sb.Replace(@"://*\.", @"://[\x2DA-Za-z0-9]*\.");
    
        return new Regex(sb.ToString());
    }
    

2 个答案:

答案 0 :(得分:3)

我认为,如果您可以将白名单规则表达为正则表达式,或者作为单个复合正则表达式,或者作为单独表达式的列表并检查是否有任何匹配,那会更好,例如:

var whitelist = new [] {@".*\.aaaaa\.com/*.", @"www.bbbbb.com/.*"};
var list = new [] { "mobi.aaaaa.com/Something", "mobi.bbbbb.com/" };
var matches = list.Where(x => whitelist.Any(y => Regex.IsMatch(x, y)));

更新

由于您已经在通配符语法中有大量要匹配的模式,您可以使用以下函数首先将它们转换为Regex语法:

public string WildcardToRegex(string pattern)
{
  return "^" + Regex.Escape(pattern).
  Replace("\\*", ".*").
  Replace("\\?", ".") + "$";
}

(来自http://www.codeproject.com/Articles/11556/Converting-Wildcards-to-Regexes

所以:

var wildcardWhitelist = new [] { "*.aaaaa.com/*", "www.bbbbb.com/*" };
var regexWhitelist = wildcardWhitelist.Select(x => WildcardToRegex(x));
var list = new [] { "mobi.aaaaa.com/Something", "mobi.bbbbb.com/" };
var matches = list.Where(x => regexWhitelist.Any(y => Regex.IsMatch(x, y)));

答案 1 :(得分:3)

var urls = new List<WhiteListItem>
{
    new WhiteListItem() {Url = "*.aaaaa.com/*"},
    new WhiteListItem() { Url = "www.bbbbb.com/*" },
    new WhiteListItem() { Url = "www.ccccc.com" },
    new WhiteListItem() { Url = "www.ddddd.com/ddddddd" };
};
var validatedUrls = urls.Select(u => new 
{
    // here you can use Regular Expression pattern to validate your Urls 
    //or you can use your custom method
    IsPassed = Regex.IsMatch("",u.Url),
    Url = u.Url,
}).ToList();

var goodUrls = validatedUrls.Where(u=> u.IsPassed).Select(u=>u.Url);
var badUrls = validatedUrls.Where(u=> !u.IsPassed).Select(u=>u.Url);