Question

我想计算链接中包含下划线的href链接。

我正在使用正则表达式查找所有href但无法获得href中包含href

的hore字符

  MatchCollection hyperlinks = Regex.Matches(strIn, @"<(?<Tag_Name>(a)|img)\b[^>]*?\b(?<URL_Type>(?(1)href|src))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)')", RegexOptions.IgnoreCase | RegexOptions.Multiline);

exmple

<a href="http://hyderabad.yalwa.in/Building_Construction/G/"

Answer 1

var _len = $("a[href*='_']").length;

使用通配符*选择其中包含a _

的所有href

说明

$("a") // Selects all elements with a tag $("a[href='1234']") // Select all element with a tag whose href is exactly equals to 1234 $("a[href*='_']") // Select all element with a tag whose href contains string _

因为它总是返回一个数学元素数组，所以.length会给你数数。

Answer 2

同样，我建议使用HtmlAgilityPack。

我之前方法的唯一变化是XPath：//a[contains(@href,'_')]。它将获取<a>属性包含href符号的所有_个代码。

请参阅此代码：

public int HtmlAgilityPackCountAwithUnderscore(string html)
{
    HtmlAgilityPack.HtmlDocument hap;
    Uri uriResult;
    if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
    { // html is a URL 
        var doc = new HtmlAgilityPack.HtmlWeb();
        hap = doc.Load(uriResult.AbsoluteUri);
    }
    else
    { // html is a string
        hap = new HtmlAgilityPack.HtmlDocument();
        hap.LoadHtml(html);
    }
    var nodes = hap.DocumentNode.SelectNodes("//a[contains(@href,'_')]");
    return nodes != null ? nodes.Count : -1;
}

我强烈建议您切换到适当的HTML解析方法，否则您将失去大量宝贵的时间来考虑正则表达式出错的地方。

如何在c＃中使用正则表达式从总href中提取下划线hrefs

2 个答案: