Html Agility Pack - 标题和链接问题

时间:2014-05-05 18:41:18

标签: html html-agility-pack

<article class="four columns">
<header class="four columns alpha">
    <h2 class="trunker">
        <span id="MainContentPlaceHolder_ctl12_ctl16_MovieTitleH2" title="En du elsker">En du elsker</span>
    </h2>
    <hr />
</header>
<figure class="two columns alpha">


    <div id="MainContentPlaceHolder_ctl12_ctl16_insertVignetTop"></div>
    <div id="MainContentPlaceHolder_ctl12_ctl16_insertVignetBottom"></div>
    <div id="MainContentPlaceHolder_ctl12_ctl16_insertTrailer"><div id="playerPoster" class="playTrailer" name="14532" onclick="x=1;"></div></div>
    <a id="MainContentPlaceHolder_ctl12_ctl16_MovieDetailsHyperLink" title="En du elsker" href="MovieDetails.aspx?movieId=3383"><img src="http://mother.poweredbyintegra.dk/posters/enduelsker_hoej_m.jpg" id="MainContentPlaceHolder_ctl12_ctl16_ImageUrlImg" /></a>
</figure>
<div id="MainContentPlaceHolder_ctl12_ctl16_ShowTimesDiv" class="two columns omega">

<span class="ticket"><input name="ctl00$MainContentPlaceHolder$ctl12$ctl16$ctl03" type="button" class="ticket" value="Læs mere" onclick="location.href=&#39;MovieDetails.aspx?movieId=3383&#39;" /></span><span class="ticket"><input name="ctl00$MainContentPlaceHolder$ctl12$ctl16$ctl04" type="button" class="ticket" value="18:30" onclick="location.href=&#39;OrderMovieTicket.aspx?showId=11837&#39;" /></span></div>

我需要title="En du elsker"并将href="MovieDetails.aspx?movieId=3383"链接到一起工作我希望它能为同一个下一个3做同样的事情。

这是我尝试的方式:

@using HtmlAgilityPack;

@{
HtmlWeb hw = new HtmlWeb(); 
hw.AutoDetectEncoding = true;
hw.OverrideEncoding = System.Text.Encoding.GetEncoding("ISO-8859-1");

HtmlDocument doc = hw.Load("ronnebio.dk/NextDaysProgramme.aspx?offset=0");

 //doc.DetectEncodingAndLoad(
 List<string> temp = new List<string>();
 int count = 1;

foreach(HtmlNode link in doc.DocumentNode.SelectNodes("//div[@class='inner clearfix']"))
{
    if (count > 3)
    {
        break; 
    }

    string linkhref = link.GetAttributeValue("href", "");
    string titel = link.InnerText;
        if (linkhref != "" 
        && linkhref.Contains("MovieDetails.aspx")
        && !temp.Contains(titel))
    {
        temp.Add(titel);
        count++;

        <div class="nyhedlink"><a href="@linkhref" target="_blank">- @titel</a></div>
    }
}
}

我找不到问题?希望你能解决问题-THX

2 个答案:

答案 0 :(得分:0)

根据以下查询确认您发布的网址,您只能通过以下方式获取前三个a代码:

var list = (from item in doc.DocumentNode.Descendants("a")
                       where
                           item.ParentNode.Name.Equals("figure") &&
                           item.ParentNode.Attributes["class"].Value == "two columns alpha"
                       select new
                              {
                                  Title = item.Attributes["title"].Value,
                                  Link = item.Attributes["href"].Value
                              }).Take(3);

你可以这样做:

foreach (var item in list)
{
    /* for example */
    <div class="nyhedlink"><a href="@item.Link" target="_blank">- @item.Title</a></div>       
}

答案 1 :(得分:0)

Thx m8 ...仍然有点不知道你把第一个代码写在哪里..这样它只能找到一个txt但却不断重复它很多次。我只需要前3个标题和href。

 @using HtmlAgilityPack;

@{
    HtmlWeb hw = new HtmlWeb(); 
    hw.AutoDetectEncoding = true;
    hw.OverrideEncoding = System.Text.Encoding.GetEncoding("ISO-8859-1");

    HtmlDocument doc = hw.Load("http://ronnebio.dk/NextDaysProgramme.aspx?offset=0");

     //doc.DetectEncodingAndLoad(
     List<string> temp = new List<string>();
     int count = 1;

    var element = (from item in doc.DocumentNode.Descendants("a")
               where item.Id == "MainContentPlaceHolder_ctl12_ctl16_MovieDetailsHyperLink"
               select new
                      {
                          Title = item.Attributes["title"].Value,
                          Link = item.Attributes["href"].Value
                      }).First();

            <div class="nyhedlink"><a href="@element.Link" target="_blank">- @element.Title</a></div>

        }
    }
}