Html Agility Pack,SelectSingleNode

时间:2017-02-03 16:22:40

标签: c# .net html-agility-pack

此代码有效

        WebClient client = new WebClient();
        client.Encoding = Encoding.UTF8;
        html = client.DownloadString("http://www.imdb.com/chart/moviemeter?ref_=nv_mv_mpm_8");
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(html);
        MessageBox.Show(doc.DocumentNode.SelectSingleNode("//*[@id='main']/div/span/div/div/div[3]/table/tbody/tr[1]/td[2]/a").InnerText);

这里的HTML代码:

<a href="/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&amp;pf_rd_p=2240084082&amp;pf_rd_r=1QW31NGD6JSE46F79CKQ&amp;pf_rd_s=center-1&amp;pf_rd_t=15506&amp;pf_rd_i=moviemeter&amp;ref_=chtmvm_tt_1" title="M. Night Shyamalan (dir.), James McAvoy, Anya Taylor-Joy">Split</a>

MessageBox显示“Split”文本。但看看这个Html代码:

<div class="summary_text" itemprop="description">
                Three girls are kidnapped by a man with a diagnosed 23 distinct personalities, and must try and escape before the apparent emergence of a frightful new 24th.
        </div>

我希望MessageBox显示以“Three girls are kidn ...”开头的文字,所以我写了这段代码:

        WebClient client2 = new WebClient();
        client2.Encoding = Encoding.UTF8;
        HtmlAgilityPack.HtmlDocument doc2 = new HtmlAgilityPack.HtmlDocument();
        doc2.LoadHtml(client2.DownloadString("http://www.imdb.com/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084082&pf_rd_r=1QW31NGD6JSE46F79CKQ&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1"));
        MessageBox.Show(doc2.DocumentNode.SelectSingleNode("//*[@id='title - overview - widget']/div[3]/div[1]/div[1]").InnerText);

当我启动此代码时,发生了“System.NullReferenceException”类型的未处理异常

Xpaths是真的,我已经检查了一百次,所以我该怎么办?

1 个答案:

答案 0 :(得分:2)

你能试试吗?

        HtmlWeb web = new HtmlWeb();
        HtmlDocument doc = web.Load("http://www.imdb.com/title/tt4972582/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=2240084082&pf_rd_r=1QW31NGD6JSE46F79CKQ&pf_rd_s=center-1&pf_rd_t=15506&pf_rd_i=moviemeter&ref_=chtmvm_tt_1");
        var desNodeText = doc.DocumentNode.Descendants("div").FirstOrDefault(o => o.GetAttributeValue("class", "") == "summary_text").InnerText;