好的,我需要查询实时网站以从表中获取数据,将此HTML表放入DataTable然后使用此数据。到目前为止,我已经设法使用Html Agility Pack和XPath来获取我需要的表中的每一行,但我知道必须有一种方法可以将其解析为DataTable。 (C#)我目前使用的代码是:
string htmlCode = "";
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
//My attempt at LINQ to solve the issue (not sure where to go from here)
var myTable = doc.DocumentNode
.Descendants("table")
.Where(t =>t.Attributes["summary"].Value == "Table One")
.FirstOrDefault();
//Finds all the odd rows (which are the ones I actually need but would prefer a
//DataTable containing all the rows!
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
{
string test = cell.InnerText;
//Have not gone further than this yet!
}
我要查询的网站上的HTML表格如下:
<table summary="Table One">
<tbody>
<tr class="odd">
<td>Some Text</td>
<td>Some Value</td>
</tr>
<tr class="even">
<td>Some Text1</td>
<td>Some Value1</td>
</tr>
<tr class="odd">
<td>Some Text2</td>
<td>Some Value2</td>
</tr>
<tr class="even">
<td>Some Text3</td>
<td>Some Value3</td>
</tr>
<tr class="odd">
<td>Some Text4</td>
<td>Some Value4</td>
</tr>
</tbody>
</table>
我不确定是否更好/更容易使用LINQ + HAP或XPath + HAP来获得所需的结果,我尝试了两种方法,但您可能会看到它们。这是我第一次制作一个查询网站甚至以任何方式与网站互动的程序,所以我现在非常不确定!感谢您提前提供任何帮助:)
答案 0 :(得分:8)
使用上面的一些Jack Eker代码和Mark Gravell(see post here)的一些代码,我设法找到了解决方案。 此代码段用于在撰写本文时获取南非2012年的公众假期
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Web;
using System.Net;
using HtmlAgilityPack;
namespace WindowsFormsApplication
{
public partial class Form1 : Form
{
private DataTable dt;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
dt = new DataTable();
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Value", typeof(string));
int count = 0;
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
{
foreach (HtmlNode row in table.SelectNodes("tr"))
{
if (table.Id == "table2")
{
DataRow dr = dt.NewRow();
foreach (var cell in row.SelectNodes("td"))
{
if ((count % 2 == 0))
{
dr["Name"] = cell.InnerText.Replace(" ", " ");
}
else
{
dr["Value"] = cell.InnerText.Replace(" ", " ");
dt.Rows.Add(dr);
}
count++;
}
}
}
dataGridView1.DataSource = dt;
}
}
}
}
答案 1 :(得分:4)
HTML Agility Pack没有开箱即用的方法,但创建一个方法应该不会太难。有samples out there从Linq-to XML到Datatable的XML。这些可以重新制作成你需要的东西。
如果需要,我可以帮助创建整个方法,但不是今天:)。
另见:
答案 2 :(得分:3)
这是我的解决方案。可能有点乱,但目前工作正常:D
string htmlCode = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
DataTable dt = new DataTable();
dt.Columns.Add("Name", typeof(string));
dt.Columns.Add("Value", typeof(decimal));
int count = 0;
decimal rowValue = 0;
bool isDecimal = false;
foreach (var row in doc.DocumentNode.SelectNodes("//table[@summary='Table Name']/tbody/tr"))
{
DataRow dr = dt.NewRow();
foreach (var cell in row.SelectNodes("td"))
{
if ((count % 2 == 0))
{
dr["Name"] = cell.InnerText.Replace(" ", " ");
}
else
{
isDecimal = decimal.TryParse((cell.InnerText.Replace(".", "")).Replace(",", "."), out rowValue);
if (isDecimal)
{
dr["Value"] = rowValue;
}
dt.Rows.Add(dr);
}
count++;
}
}
答案 3 :(得分:1)
将htmltable转换为datatable的简单逻辑:
//Define your webtable
public static HtmlTable table
{
get
{
HtmlTable var = new HtmlTable(parent);
var.SearchProperties.Add("id", "searchId");
return var;
}
}
//Convert a webtable to datatable
public static DataTable getTable
{
get
{
DataTable dtTable= new DataTable("TableName");
UITestControlCollection rows = table.Rows;
UITestControlCollection headers = rows[0].GetChildren();
foreach (HtmlHeaderCell header in headers)
{
if (header.InnerText != null)
dtTable.Columns.Add(header.InnerText);
}
for (int i = 1; i < rows.Count; i++)
{
UITestControlCollection cells = rows[i].GetChildren();
string[] data = new string[cells.Count];
int counter = 0;
foreach (HtmlCell cell in cells)
{
if (cell.InnerText != null)
data[counter] = cell.InnerText;
counter++;
}
dtTable.Rows.Add(data);
}
return dtTable;
}
}
答案 4 :(得分:0)
你可以尝试
DataTable.Rows[i].Cells[j].InnerText;
其中DataTable是表的id,i是行,j是单元格。