使用LINQ和HtmlAgilityPack解析表

时间:2014-05-23 07:39:18

标签: c# linq html-agility-pack

如何在网页上使用LINQ解析HTML以从表中获取innerhtml值?

我正在使用HtmlAgilityPack,并希望尽可能好地解析一些值。

您看到的号码(00000,00001,00002 ..)是代理商的唯一号码。

所以也许有一种方法可以使用LINQ来解析这些数字并从td's获得以下值

(姓名,123,州和信息)=> 00000,约翰,123,IDLE,咖啡 所以我可以单独调用它们并与它们一起工作 - 也许是在数组中?

</TH>
    </TR>
    <TR ALIGN=RIGHT>
        <TD ALIGN=LEFT>00000</TD>
        <TD ALIGN=LEFT>John</TD>
        <TD ALIGN=CENTER>123</TD>
        <TD ALIGN=LEFT>IDLE</TD>
        <TD ALIGN=LEFT>coffee</TD>
    </TR>
    <TR ALIGN=RIGHT>
        <TD ALIGN=LEFT>00001</TD>
        <TD ALIGN=LEFT>Lisa</TD>
        <TD ALIGN=CENTER>123</TD>
        <TD ALIGN=LEFT>IDLE</TD>
        <TD ALIGN=LEFT>coffee</TD>
    </TR>
    <TR ALIGN=RIGHT>
        <TD ALIGN=LEFT>00002</TD>
        <TD ALIGN=LEFT>Mary</TD>
        <TD ALIGN=CENTER>123</TD>
        <TD ALIGN=LEFT>IDLE</TD>
        <TD ALIGN=LEFT>coffee</TD>
    </TR>
    <TR ALIGN=RIGHT>
        <TD ALIGN=LEFT>00003</TD>
        <TD ALIGN=LEFT>Tim</TD>
        <TD ALIGN=CENTER>123</TD>
        <TD ALIGN=LEFT>IDLE</TD>
        <TD ALIGN=LEFT>coffee</TD>
    </TR>
....

提前致谢!

2 个答案:

答案 0 :(得分:1)

这看起来很像“请给我代码我需要问题”,我非常不喜欢。看看以下内容并确保您理解它:

var doc = ... // Load the document
var trs = doc.DocumentNode.Descendants("TR"); // Give you all the TRs
foreach (var tr in trs)
{
  var tds = tr.Descendants("TD").ToArray(); // Get all the TDs
  // Turn them into our datastructure
  var data = new {
             Name  = tds[1].InnerText,
             Number = tds[2].InnerText,
             State = tds[3].InnerText,
             Info  = tds[4].InnerText,
             };
  // Do something with data
}

仅使用LINQ:

var data = from tr in doc.DocumentNode.Descendants("TR")
           let tds = tr.Descendants("TD").ToArray()
           select new {
             Name  = tds[1].InnerText,
             Number = tds[2].InnerText,
             State = tds[3].InnerText,
             Info  = tds[4].InnerText,
             };

答案 1 :(得分:0)

@flindeberg给出了一个非常合理的答案(对他/她来说是+1),你可以避免这样的ToArray

private class Row
{
    public string Name { get; set; }
    public int Number { get; set; }
    public string State { get; set; }
    public string Info { get; set; }
}

...

var mappings = new Action<string, Row>[]
{
    (value, row) => row.Name = value,
    (value, row) => row.Number = int.Parse(value),
    (value, row) => row.State = value,
    (value, row) => row.Info = value
};

var doc = ... // Load the document
var trs = doc.DocumentNode.Descendants("TR"); // Give you all the TRs
foreach (var tr in trs)
{
  var row = new Row();
  tr.Descendants("TD").Zip(mappings, (td, map) =>
  {
      map(td.InnerText, row);
      return true;
  });

  // You now have a populated row.
}