Question

我从C＃中的一些爬虫开始，我听说HtmlAgilityPack是最好的解决方案。

我无法找到有效的使用示例，所以也许有人会帮我解决我的问题。

在一个课程中，我使用方法获取我想要的部分代码。例如ul with class＆＃34; testable ul＆＃34;

public static string GetElement(string url, string element, string type, string name)
{
    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load(url);
    string rate = doc.DocumentNode.SelectSingleNode("//"+ element +"[@"+ type +"='"+ name +"']").OuterHtml;
    return rate;
}

所以我正在运行

string content = SiteMethods.GetElement(startPage, "ul", "class", "testable ul");

现在有一部分我正在做一些背景工作，但最后我又将该字符串加载到HtmlAgality：

            HtmlDocument html = new HtmlDocument();
            html.OptionOutputAsXml = true;
            html.LoadHtml(content);
            HtmlNode document = html.DocumentNode;

这里我有一个问题。内容字符串中的结构如下：

<ul class="testable ul">
    <li>
        <a href="http://www.veryimportant.link">
            <div class="img">
                <img src="http://image.so.important/">
            </div>
            <div class="info">
                <span class="name">
                    NAME
                </span>
                <span class="price">10</span>
                <span class="price2">8</span>
                <span class="grade">C</span>
            </div>
            <p class="tips">tips</p>
        </a>
    </li>
    <li>
        <a href="http://www.veryimportant.link/2">
            <div class="img">
                <img src="http://image.so.important/2">
            </div>
            <div class="info">
                <span class="name">
                    NAME2
                </span>
                <span class="price">3</span>
                <span class="price2">4</span>
                <span class="grade">A</span>
            </div>
            <p class="tips">tips2</p>
        </a>
    </li>
</ul>

所以问题是：

如何让每个<li>获得不同的对象？采取进一步行动。
是否可以通过一个简单的命令获取链接http://www.veryimportant.link和http://www.veryimportant.link/2，或者例如图片http://image.so.important/和http://image.so.important/2？如何获得它们？
如何在列表中获取NAME和NAME2？
是否可以将整个html结构映射到列表？

请通过一些例子，其余的学习将非常简单。

HtmlAgilityPack使用结构

0 个答案: