我试图将网站上的数据下载到数据表中。问题是我无法访问正确的节点,因为似乎有空白空间。到目前为止,这是我的代码:
public static DataTable downloadtable()
{
DataTable dt = new DataTable();
string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22");
}
//this is just to check the file structure from text file
System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt");
file.WriteLine(htmlCode);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
dt = new DataTable();
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[@class='list electricity']/tr/th[@class='title'][.='Market Area']"))
{
//This is the problem name where I get the error
foreach (HtmlNode row in table.SelectNodes("//td[@class='title'][.=' 00-01 ']"))
{
foreach (var cell in row.SelectNodes("//td"))
{
//this is to check for correct result, final result would be to dump it into datatable
Console.WriteLine(cell.InnerText);
}
}
}
return dt;
}
我试图从代码中的链接下载小时价格但由于尾随空白(我认为)似乎失败了。 是否有类似于节点名称的声明?或者你可以删除尾随空白吗?
答案 0 :(得分:1)
我认为您的问题是,您正试图从td
节点内部检索td
,这显然没有更多td
的节点。
<tr>
<td class="title"> 00-01 </td>
<td class="spacer"></td>
<td class="r">€/MWh</td>
<td class="spacer"></td>
<td>35.34</td>
<td class="spacer"></td>
<td>34.02</td>
<td class="spacer"></td>
<td>34.02</td>
</tr>
因此,如果您尝试使用结果table.SelectNodes("//td[@class='title'][.=' 00-01 ']")
进行迭代,则其中不会包含任何td。
如果您想要从00-01 开始的所有行,您可以使用此行:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]/ancestor::table"))
{
foreach (var cell in row.SelectNodes("./tr/td"))
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
如果您只想要00-01行,可以使用此行:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[@class='title']"))
{
if (row.InnerText.Trim() == "00-01")
{
foreach (var cell in row.ParentNode.ChildNodes)
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}
}
或者您可以将其用作:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]"))
{
foreach (var cell in row.ParentNode.ChildNodes)
{
if (string.IsNullOrEmpty(cell.InnerText.Trim()))
continue;
Console.WriteLine(cell.InnerText.Trim());
}
}