我想要从id = statsTable的表中确定所有元素,并希望我可以将所有数据读入csv。
这是我到目前为止所做的:
// Create a request for the URL.
WebRequest request = WebRequest.Create("http://www.pgatour.com/stats/stat.120.html");
Console.WriteLine("Requesting data from: http://www.pgatour.com/stats/stat.120.html");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;
WebResponse response = request.GetResponse();
using (Stream stream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(stream);
// covert html to string
String responseString = reader.ReadToEnd();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseString);
var desktopFolder = Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory);
var fullFileName = Path.Combine(desktopFolder, "GolfStats.csv");
using (var PlayerFile = new StreamWriter(fullFileName))
{
PlayerFile.WriteLine("Data downloaded: " + DateTime.Now);
var myTable = doc.DocumentNode
.Descendants("table")
.Where(table => table.Attributes.Contains("id"))
.SingleOrDefault(table => table.Attributes["id"].Value == "statsTable");
var myTableValues = myTable.Descendants("td");
foreach (var tdV in myTableValues)
{
PlayerFile.WriteLine(tdV.InnerText);
Console.WriteLine(tdV.InnerText);
}
PlayerFile.Flush();
}
}
问题是我的csv只是将数据列在一个列中,以及拾取放在表中的广告(请参阅webRequest中的url)。如果你可以帮我输出表格格式的数据,那就太棒了!
答案 0 :(得分:1)
您为每个表格单元格创建一个新行。要更改它,以便每个表行都有一个单独的行替换
var myTableValues = myTable.Descendants("td");
foreach (var tdV in myTableValues)
{
PlayerFile.WriteLine(tdV.InnerText);
Console.WriteLine(tdV.InnerText);
}
与
var myTableRows = myTable.Descendants("tr").Where(tr => tr.Attributes.Contains("id"));
foreach (var tr in myTableRows)
{
string line = string.Join(";", tr.Descendants("td").Select(td => td.InnerText));
PlayerFile.WriteLine(line);
Console.WriteLine(line);
}
.Where(tr => tr.Attributes.Contains("id"))
过滤了广告,因为广告的表格行没有ID而所有玩家行都有。