删除两个值之间的值

时间:2016-10-17 23:33:33

标签: c# html-agility-pack

可能有点复杂,但我努力尝试并得到了结果。我正在使用HtmlAgilityPack从网站上获取视频链接。

HtmlWeb hw = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load(@"C:\Users\e9396\Desktop\r.html");
foreach (HtmlNode links in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    if (links.NextSibling != null)
    {
        ArrayList ArrayLinksList = new ArrayList();
        ArrayLinksList.Add(links.Attributes["href"].Value);
        listbox.Items.AddRange(ArrayLinksList.ToArray());
    }
}

但有些链接如下。

/video/93409905175
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93409905175&st.cmd=userMain
/video/93361801751
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93361801751&st.cmd=userMain
/video/93442476567
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93442476567&st.cmd=userMain
/video/93409839639
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93409839639&st.cmd=userMain
/video/93442411031
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93442411031&st.cmd=userMain
/video/93442345495
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93442345495&st.cmd=userMain
/video/93461940759
/dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93461940759&st.cmd=userMain

像这样的链接“/ video / 93409905175”确定。

但我想删除像这样的链接

  

“/ video / 93409905175 /dk?cmd=VideoVitrinaPopup&st.redirect=myVideo&st.vvp_cmd=VideoVitrinaPopupMovieEdit&st.vv_movieId=93409905175&st.cmd=userMain".

我无法删除它,因为标记位置的标识为粗体。

我想要那样,谢谢。

/video/93409905175
/video/93361801751
/video/93442476567
/video/93409839639
/video/93442411031
/video/93442345495
/video/93461940759

2 个答案:

答案 0 :(得分:1)

使用此功能:

public static IEnumerable<string> FilterLinks(HtmlDocument doc, string regexFilter)
{
    var regex = new Regex(regexFilter);
    return doc.DocumentNode
        .SelectNodes("//a[@href]")
        .Where( n => n.NextSibling != null && 
                regex.IsMatch(n.GetAttributeValue("href", string.Empty)))
        .Select(n => n.GetAttributeValue("href", string.Empty));
}

这样称呼:

foreach(var link in FilterLinks(doc, @"^\/video\/[0-9]*")) listbox.Items.Add(link);

答案 1 :(得分:0)

感谢Travis Sharp的进步,但过程链接返回的值的类型可能不适合。

public static IEnumerable<string> FilterLinks(HtmlAgilityPack.HtmlDocument doc, string regexFilter)
{
    var regex = new Regex(regexFilter);
    return doc.DocumentNode
              .SelectNodes("//a[@href]")
              .Where(n => n.NextSibling != null 
                       && regex.IsMatch(n.GetAttributeValue("href", string.Empty)))
              .Select(n => n.GetAttributeValue("href", string.Empty));
}

FilterLinks我们从变量中获取返回值 我们称之为:

var xLinkeler = FilterLinks(doc, @"^\/video\/[0-9]*");
foreach (var iett in xLinkeler)
{
    listbox.Items.Add(iett);
}