我一直在试图通过HtmlAgilityPack从网站获取数据

时间:2017-03-02 11:24:32

标签: c# html html-agility-pack

首先,我尝试了很多方法,但我无法解决问题。我不知道如何在SelectSingleNode(?)方法中放置我的节点方式。我创建了一个html路径来在我的c#代码中访问我的节点但是如果我运行这个代码,我会因为我的html路径而采用NullReferenceException。我只想告诉你如何创建我的HTML方式或任何其他解决方案? 这是html代码的示例:

<html>
    <body>
        <div id="container">
            <div id="box">
                <div class="box">
                    <div class="boxContent">
                        <div class="userBox">
                            <div class="userBoxContent">
                                <div class="userBoxElement">
                                    <ul id ="namePart">
                                        <li>
                                            <span class ="namePartContent>

                                            </span>
                                        </li>
                                    </ul>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </body>
</html>

这是我的C#代码:

namespace AgilityTrial
{
    class Program
    {
        static void Main(string[] args)
        {
            Uri url = new Uri("https://....");
            WebClient client = new WebClient();
            client.Encoding = Encoding.UTF8;
            string html = client.DownloadString(url);
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            string path = @"//html/body/div[@id='container']/div[@id='classifiedDetail']"+
                "/div[@class='classifiedDetail']/div[@class='classifiedDetailContent']"+
                "/div[@class='classifiedOtherBoxes']/div[@class='classifiedUserBox']"+
                "/div[@class='classifiedUserContent']/ul[@id='phoneInfoPart']/li"+
                "/span[@class='pretty-phone-part show-part']";
            var tds =  doc.DocumentNode.SelectSingleNode(path);
            var date = tds.InnerHtml;

              Console.WriteLine(date);
        }
    }
}  

1 个答案:

答案 0 :(得分:1)

namePartContent span节点为例。如果您想获取该数据,您只需执行此操作:

doc.DocumentNode.SelectSingleNode(".//span[@class='namePartContent']")?.InnerText;

它将搜索/获取单个span节点,其namePartContent为其类,从根节点开始,在您的情况下为<html>;