我有这样的表
<table>
<tbody>
<tr>
<td>Header1</td>
<td>Header2</td>
<td>Header3</td>
<td>Header4</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>11</td>
<td>22</td>
<td>33</td>
<td>44</td>
</tr>
</tbody>
</table>
我的代码是
var headersList = xmlDoc.XPathSelectElements("//table//tbody//tr").ToList();
但是headerlist给出了所有的td值: - (
现在我想知道如何循环这个表。我的预期结果如下:
第一个循环预期结果
Header1 = 1,
Header2 = 2,
Header3 = 3,
Header4 = 4,
第二个循环预期结果:
Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44
任何帮助都会非常感激
答案 0 :(得分:1)
由于您无法区分标题和正文(缺少thead
,tbody
),因此您需要以编程方式确定元素的数量。
var data = @"<table>
<tbody>
<tr>
<td>Header1</td>
<td>Header2</td>
<td>Header3</td>
<td>Header4</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>11</td>
<td>22</td>
<td>33</td>
<td>44</td>
</tr>
</tbody>
</table>";
var xDoc = XDocument.Parse(data);
var headerElements = xDoc.XPathSelectElements("//table//tbody//tr");
int headerCount = headerElements.First().Descendants().Count();
var nodes = headerElements.SelectMany(x => x.Descendants())
.Select(x => x.Value)
.ToList();
var head = nodes.Take(headerCount).ToList();
var body = nodes.Skip(headerCount).ToList();
var pairs = new List<Tuple<string,string>>();
for(var i = 0; i < body.Count; i += headerCount)
{
for(int j = 0; j < head.Count; j++)
{
pairs.Add(Tuple.Create(head[j], body[i+j]));
}
}
foreach(var pair in pairs)
{
Console.WriteLine("{0} = {1}", pair.Item1, pair.Item2);
}
答案 1 :(得分:0)
使用HtmlAgilityPack(可从NuGet获得)来解析HTML文档。以下是向控制台显示表数据的示例:
var doc = new HtmlDocument();
doc.Load(path_to_html);
var rows =
doc.DocumentNode.SelectNodes("//table/tbody/tr")
.Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
.ToList();
输出:
var headers = rows[0];
// skip first row which contains headers
foreach (var row in rows.Skip(1))
{
for (int i = 0; i < row.Count; i++)
if (headers.Count > i) // you can remove this check if data is valid
Console.WriteLine("{0} = {1}", headers[i], row[i]);
}
结果:
Header1 = 1
Header2 = 2
Header3 = 3
Header4 = 4
Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44
如果您需要为列定义标题,我建议您使用thead
标记:
<table>
<thead>
<tr>
<td>Header1</td>
<td>Header2</td>
<td>Header3</td>
<td>Header4</td>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>11</td>
<td>22</td>
<td>33</td>
<td>44</td>
</tr>
</tbody>
</table>
在这种情况下,解析和输出将类似于
var headers = doc.DocumentNode.SelectNodes("//table/thead/tr/td")
.Select(td => td.InnerHtml).ToList();
var rows =
doc.DocumentNode.SelectNodes("//table/tbody/tr")
.Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
.ToList();
foreach (var row in rows)
{
for(int i = 0; i < row.Count; i++)
Console.WriteLine("{0} = {1}", headers[i], row[i]);
}