给出所有值的c#循环表?

时间:2013-06-12 11:57:49

标签: c# xml linq-to-xml html-agility-pack xmldocument

我有这样的表

<table>
  <tbody>
    <tr>
       <td>Header1</td>
       <td>Header2</td>
       <td>Header3</td>
       <td>Header4</td>
    </tr>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
       <td>4</td>
    </tr>
    <tr>
       <td>11</td>
       <td>22</td>
       <td>33</td>
       <td>44</td>
    </tr>

  </tbody>
</table>

我的代码是

var headersList = xmlDoc.XPathSelectElements("//table//tbody//tr").ToList();

但是headerlist给出了所有的td值: - (

现在我想知道如何循环这个表。我的预期结果如下:

第一个循环预期结果

Header1 = 1,
Header2 = 2,
Header3 = 3,
Header4 = 4,

第二个循环预期结果:

Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44

任何帮助都会非常感激

2 个答案:

答案 0 :(得分:1)

由于您无法区分标题和正文(缺少theadtbody),因此您需要以编程方式确定元素的数量。

var data = @"<table>
  <tbody>
    <tr>
       <td>Header1</td>
       <td>Header2</td>
       <td>Header3</td>
       <td>Header4</td>
    </tr>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
       <td>4</td>
    </tr>
    <tr>
       <td>11</td>
       <td>22</td>
       <td>33</td>
       <td>44</td>
    </tr>

  </tbody>
</table>";

var xDoc = XDocument.Parse(data);
var headerElements = xDoc.XPathSelectElements("//table//tbody//tr");
int headerCount = headerElements.First().Descendants().Count();
var nodes = headerElements.SelectMany(x => x.Descendants())
                          .Select(x => x.Value)
                          .ToList();
var head = nodes.Take(headerCount).ToList();
var body = nodes.Skip(headerCount).ToList();

var pairs = new List<Tuple<string,string>>();

for(var i = 0; i < body.Count; i += headerCount)
{
    for(int j = 0; j < head.Count; j++)
    {
        pairs.Add(Tuple.Create(head[j], body[i+j]));
    }
}

foreach(var pair in pairs)
{
    Console.WriteLine("{0} = {1}", pair.Item1, pair.Item2);
}

答案 1 :(得分:0)

使用HtmlAgilityPack(可从NuGet获得)来解析HTML文档。以下是向控制台显示表数据的示例:

var doc = new HtmlDocument();
doc.Load(path_to_html);
var rows = 
      doc.DocumentNode.SelectNodes("//table/tbody/tr")
         .Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
         .ToList();

输出:

var headers = rows[0];

// skip first row which contains headers
foreach (var row in rows.Skip(1))
{
    for (int i = 0; i < row.Count; i++)
        if (headers.Count > i) // you can remove this check if data is valid
            Console.WriteLine("{0} = {1}", headers[i], row[i]);
}

结果:

Header1 = 1
Header2 = 2
Header3 = 3
Header4 = 4

Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44

如果您需要为列定义标题,我建议您使用thead标记:

<table>
  <thead>
    <tr>
       <td>Header1</td>
       <td>Header2</td>
       <td>Header3</td>
       <td>Header4</td>
    </tr>
  </thead>
  <tbody>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
       <td>4</td>
    </tr>
    <tr>
       <td>11</td>
       <td>22</td>
       <td>33</td>
       <td>44</td>
    </tr>
  </tbody>
</table>

在这种情况下,解析和输出将类似于

var headers = doc.DocumentNode.SelectNodes("//table/thead/tr/td")
                 .Select(td => td.InnerHtml).ToList();
var rows = 
      doc.DocumentNode.SelectNodes("//table/tbody/tr")
         .Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
         .ToList();

foreach (var row in rows)
{
    for(int i = 0; i < row.Count; i++)
        Console.WriteLine("{0} = {1}", headers[i], row[i]);
}