Question

C＃+ webclient + htmlagility pack + web parsing

我想查看此page的作业列表，但我无法解析这些链接，因为它会发生变化。

其中一个例子，当我在浏览器中看到链接时（Link），

当我使用webclient和htmlagilitypack解析它时，我得到了更改的链接

我是否必须在webclient上进行设置？包括会话或脚本？

这是我的代码..

private void getLinks()
{
    StreamReader sr = new StreamReader("categories.txt");
    while(!sr.EndOfStream)
    {
        string url = sr.ReadLine();
        WebClient wc = new WebClient();
        string source = wc.DownloadString(url);
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(source);
        HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes(".//a[@class='internerLink primaerElement']");
        foreach (HtmlNode node in nodes)
        {
                Console.WriteLine("http://jobboerse.arbeitsagentur.de" + node.Attributes["href"].Value);

        }
    }
    sr.Close();
}

Answer 1

您可以尝试使用WebBrowser类（http://msdn.microsoft.com/en-us/library/system.windows.controls.webbrowser%28v=vs.110%29.aspx），然后使用其DOM Accessing DOM from WebBrowser来检索链接。

mshtml.IHTMLDocument2 htmlDoc = webBrowser.Document as mshtml.IHTMLDocument2;
// do something like find button and click
htmlDoc.all.item("testBtn").click();

webclient htmlagility包web解析

1 个答案: