网页抓取 - WP8 - HTMLAgilityPack

时间:2015-02-07 16:19:37

标签: c# windows windows-phone-8

请告诉我从http://www.azlyrics.com/lyrics/paparoach/coffeethoughts.html获取歌词的问题。我希望只有歌词才能获取。 提前谢谢你

   protected async override void OnNavigatedTo(NavigationEventArgs e)
    {
        base.OnNavigatedTo(e);
        string htmlPage = "";
        using (var client = new HttpClient())
        {
            htmlPage = await client.GetStringAsync("http://www.azlyrics.com/lyrics/paparoach/coffeethoughts.html/");
        }

        HtmlDocument htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(htmlPage);

        List<Lyrics> lyrics = new List<Lyrics>();

        foreach (var div in htmlDocument.DocumentNode.SelectNodes("//div[@style='margin-left:10px;margin-right:10px']"))
        {
            Lyrics newMovie = new Lyrics();
           newMovie.Summary= div.SelectSingleNode("br\\").InnerText.Trim();
           //newMovie.Summary =    div.SelectSingleNode(".//div[@id='lyrics']").InnerText.Trim();
           //newMovie.Title = div.SelectSingleNode(".//div[@class='title']").InnerText.Trim();
            lyrics.Add(newMovie);
        }

        lstMovies.ItemsSource = lyrics;
    }
}

}

1 个答案:

答案 0 :(得分:0)

您的查询错误。

//div[@style='margin-left:10px;margin-right:10px']

应该是

//div[@id='main']/div[3]

我写了一篇关于报废的文章:Get content from a webpage or “How to Scrape the Sky”


顺便说一句,azlyrics.com由musicxmatch提供支持。也许你应该检查他们的API而不是报废? 从源头开始安全饮用水。