Question

我正在尝试从网站上抓取表格，而我的最终输出必须是第一列中的表格数据。

表的结构如下图所示：

我感兴趣的行位于类row和alt下。运行下面的代码还会得到三个不需要的单元格，一个来自第一行align = "right"，一个来自第二行class="gna"，一个来自最后一行，它们的配置与第一行完全相同-{{1 }}。

align = "right"

如何将更多过滤器添加到输出中，以将结果仅限制在所需的单元格中？谢谢！

LE：我认为wb := ComObjCreate("InternetExplorer.Application") wb.Visible := True wb.Navigate("C:\Users\Marian\Downloads\webpage.htm") ; Wait for page to load: While wb.Busy or wb.ReadyState != 4 Sleep, 100 Table := wb.document.getElementById("gvSearchResults") Rows := Table.rows Loop % rows.length { cells := rows[A_Index-1].cells out .= cells["0"].innerText "," out := RTrim(out,",") "`n" } Msgbox, %out%和getElementsbyClassname无法正常工作，因为此网页的html协议不支持它们。

html代码开头：

queryselectorall()

Answer 1

我将使用querySelectorAll获取所有.row和.alt行。之后，您可以跳过循环的最后一行

rows := wb.document.querySelectorAll("#gvSearchResults tr.row, #gvSearchResults tr.alt")
Loop % rows.length - 1   ;-1 added to skip the last row 
{
    cells := rows[A_Index-1].cells

        out .= cells["0"].innerText ","

    out := RTrim(out,",") "`n"
}
Msgbox, %out%

根据评论

编辑。有问题的页面具有<META content="IE=7.0000" http-equiv="X-UA-Compatible">标签，该标签强制IE以IE7兼容模式运行。 IE7不支持querySelectorAll。

当IE以IE7模式运行时，该解决方案也可以使用。它不是非常灵活，因为您必须事先知道需要跳过哪些行。

wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := True
wb.Navigate("C:\Users\Marian\Downloads\webpage.htm")
; Wait for page to load:
While wb.Busy or wb.ReadyState != 4
    Sleep, 100

Table := wb.document.getElementById("gvSearchResults")
Rows := Table.rows
Loop % rows.length - 1 ;-1 added to skip the last row
{
   if (A_index = 1 OR A_index = 2) ;skip the first and the second iteration of the loop, effectively skipping the first and the second row of the table;
   continue   

   cells := rows[A_Index-1].cells
   out .= cells["0"].innerText ","
   out := RTrim(out,",") "`n"
}
Msgbox, %out%

仅选择行，而不选择html表中的标题

1 个答案: