使用PowerShell从HTML表中获取特定行

时间:2014-11-10 14:20:33

标签: html xml powershell

使用PowerShell,我正在读取一个HTML表格,我需要从中提取某些数据。到目前为止,我已将所有行读入数组$ elements。我可以通过@($ elements)[rownum]访问每一行。每行有41个单元格:

 <td title="1/1/0001" role="gridcell" aria-describedby="AvailJQGrid_TARCreateDt" style="display: none;">1/1/0001</td>
 <td title="14060700421840" role="gridcell" aria-describedby="AvailJQGrid_OraRowScn" style="display: none;">14060700421840</td>
 <td title="1BC09064EF10431D9F54FEF9BA36B0A5" role="gridcell" aria-describedby="AvailJQGrid_AvailSAID" style="display: none;">1BC09064EF10431D9F54FEF9BA36B0A5</td>
 <td title="6837758D8E6542619DF23CF5EF4928C5" role="gridcell" aria-describedby="AvailJQGrid_ActivitySAID" style="display: none;">6837758D8E6542619DF23CF5EF4928C5</td>

每个aria-describedby属性都是唯一的。现在我迭代所有41个单元格寻找我想要的单元格,然后抓住innerhtml或textcontent。有没有办法直接访问我想要的行,而不是迭代?

我使用此命令获取所有单元格 $ cells = $ element.getElementsByTagName(“td”)

有没有类似$ cells.GetInnerHtmlWithAttribute(“aria-describedby”,“AvailJQGrid_ActivitySAID”)?

1 个答案:

答案 0 :(得分:1)

将其视为XML,并使用XPath来获取您感兴趣的元素:

$TableRow = [xml]@'
<tr>
 <td title="1/1/0001" role="gridcell" aria-describedby="AvailJQGrid_TARCreateDt" style="display: none;">1/1/0001</td>
 <td title="14060700421840" role="gridcell" aria-describedby="AvailJQGrid_OraRowScn" style="display: none;">14060700421840</td>
 <td title="1BC09064EF10431D9F54FEF9BA36B0A5" role="gridcell" aria-describedby="AvailJQGrid_AvailSAID" style="display: none;">1BC09064EF10431D9F54FEF9BA36B0A5</td>
 <td title="6837758D8E6542619DF23CF5EF4928C5" role="gridcell" aria-describedby="AvailJQGrid_ActivitySAID" style="display: none;">6837758D8E6542619DF23CF5EF4928C5</td>
</tr>
'@

$InterestingTD = $TableRow.SelectNodes('//td[@aria-describedby = "AvailJQGrid_ActivitySAID"]')

您还可以使用.SelectNodes() cmdlet:

,而不是Select-XML
$InterestingTDselect = Select-Xml -Xml $TableRow -XPath '//td[@aria-describedby = "AvailJQGrid_ActivitySAID"]'
$InterestingTD = $InterestingTDselect.Node