Question

我有简单的html表：

<table>
  <tr>
    <td>
      <a href="http://someurl_1.com">item name1</a>
    </td>
    <td>
      Value 1
    </td>
  </tr>
  <tr>
    <td>
      <a href="http://someurl_2.com">item name2</a>
    </td>
    <td>
      Value 2
    </td>
  </tr>
</table>

现在我需要将该表中的数据作为List＆gt;（或string [] []）

得到它我用：

        List<List<string>>
            table = doc.DocumentNode.SelectSingleNode("//table")
                    .Descendants("tr")
                    .Skip(1)
                    .Where(tr => tr.Elements("td").Count() > 1)
                    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                    .ToList();

它，成功地只获取了字符串数据，所以结果我有

table[0][0] -> item name1
table[0][1] -> value 1
table[1][0] -> item name2
table[1][1] -> value 2

但是我没有在该阵列中使用url。

我怎样才能得到表值，所以结果我需要得到：

table[0][0] -> http://someurl_1.com
table[0][1] -> item name1
table[0][2] -> value 1
table[1][0]-> http://someurl_2.com
table[1][1] -> item name2
table[1][2] -> value 2

任何帮助欣赏！感谢

Answer 1

我建议为每个单元格使用xpath并将其数据映射到数组。

例如，第二项的xpath为 / html / body / table / tbody / tr [2] / td [1] / a

  var doc = new HtmlAgilityPack.HtmlDocument();
  doc.LoadHtml(htmlText);
  var nodes = doc.DocumentNode.SelectNodes("/html/body/table/tbody/tr[2]/td[1]/a");

会将<a href="http://someurl_2.com">item name2</a>作为一个节点，您可以进一步抓取以获取网址或文字。

通过HtmlAgilityPack从表中获取链接

1 个答案: