Question

我试图将网站上的数据下载到数据表中。问题是我无法访问正确的节点，因为似乎有空白空间。到目前为止，这是我的代码：

        public static DataTable downloadtable()
    {
        DataTable dt = new DataTable();
        string htmlCode = "";
        using (WebClient client = new WebClient())
        {
            client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
            htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22");
        }
        //this is just to check the file structure from text file
        System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt");
        file.WriteLine(htmlCode);

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

        doc.LoadHtml(htmlCode);

        dt = new DataTable();

        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[@class='list electricity']/tr/th[@class='title'][.='Market Area']"))
        {
            //This is the problem name where I get the error
            foreach (HtmlNode row in table.SelectNodes("//td[@class='title'][.='            00-01          ']"))
            {

                        foreach (var cell in row.SelectNodes("//td"))
                        {
                                //this is to check for correct result, final result would be to dump it into datatable
                                Console.WriteLine(cell.InnerText);                             
                        }
            }
        }
        return dt;
    }

我试图从代码中的链接下载小时价格但由于尾随空白（我认为）似乎失败了。是否有类似于节点名称的声明？或者你可以删除尾随空白吗？

Answer 1

我认为您的问题是，您正试图从td节点内部检索td，这显然没有更多td的节点。

<tr>
 <td class="title">         00-01           </td>
 <td class="spacer"></td>
 <td class="r">€/MWh</td>
 <td class="spacer"></td>
 <td>35.34</td>
 <td class="spacer"></td>
 <td>34.02</td>
 <td class="spacer"></td>
 <td>34.02</td>
</tr>

因此，如果您尝试使用结果table.SelectNodes("//td[@class='title'][.=' 00-01 ']")进行迭代，则其中不会包含任何td。

如果您想要从00-01 开始的所有行，您可以使用此行：

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]/ancestor::table"))
{
    foreach (var cell in row.SelectNodes("./tr/td"))
    {
        if (string.IsNullOrEmpty(cell.InnerText.Trim()))
            continue;
        Console.WriteLine(cell.InnerText.Trim());
    }
}

如果您只想要00-01行，可以使用此行：

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[@class='title']"))
{
    if (row.InnerText.Trim() == "00-01")
    {
        foreach (var cell in row.ParentNode.ChildNodes)
        {
            if (string.IsNullOrEmpty(cell.InnerText.Trim()))
                continue;
            Console.WriteLine(cell.InnerText.Trim());
        }
    }
}

或者您可以将其用作：

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]"))
{
    foreach (var cell in row.ParentNode.ChildNodes)
    {
        if (string.IsNullOrEmpty(cell.InnerText.Trim()))
            continue;
        Console.WriteLine(cell.InnerText.Trim());
    }
}

喜欢声明或删除html敏捷包中的尾随空白？

1 个答案: