Question

我正在尝试定位外表的tr中的表。外表的第一个tr包含一个table，其中包含一个标识表数据的图像。有几个外部表具有不同的图像，每个数据类型一个。我可以使用

在Nokogiri中定位图像

page.css('table tr table tr img[@src="images/bicycyles.gif"]')

我想获取数据，该数据位于外表的第三个tr的表中。我可以使用

定位页面上的所有数据

page.css('table[bgcolor="#FFFFFF"] tr[valign="top"]')

但这也会从其他数据类型中提取数据（例如，在“cars.gif”下）。

如何将这些搜索结合起来才能找到自行车数据？我基本上想说“使用tr valign=top table并bgcolor=#ffffff提取tr中的文字img src=bicycles.gif， <table> <tr> <td><img src="images/spacer.gif" width="1" height="10" /></td> </tr> <tr> <td> <table> <tr>  <td><img src="images/bicycle.gif" /></td> </tr> </table> </td> </tr> <tr> <td><img src="images/spacer.gif" width="100" height="10" /></td> </tr> <tr> <td> <table width="532"> <tr> <td>Info</td> </tr> </table> <table bgcolor="#FFFFFF"> <tr valign="top">  <td>Bicycle Name</td> </tr> </table> </td> </tr>  </table>包含{{1}} {1}}

以下是HTML的示例：

{{1}}

Answer 1

“使用tr中的valign=top提取table中的文字 bgcolor=#ffffff，tr包含img src=bicycles.gif“
的兄弟姐妹

基于示例HTML的轻微修正：

“使用tr中的valign=top提取table中的文字 bgcolor=#ffffff，其中包含tr个tr兄弟姐妹，其本身包含img src=bicycles.gif“

转换为XPath：

page.xpath('//tr[preceding-sibling::tr//img/@src = "images/bicycle.gif"]//table[@bgcolor="#FFFFFF"]//tr[@valign="top"]').text.strip

#=> "Bicycle Name"

请注意，在您提供的示例中，您需要[bgcolor="#FFFFFF"]或[valign="top"]，但不能同时使用{{1}}或{{1}}。由于依赖于硬编码样式并不理想，因此您需要越少指定更好的样式。

Answer 2

它仍然很混乱，但我认为用css更具可读性：

page.at('img[src="images/bicycle.gif"]').ancestors('tr')[1].at('~ tr tr[valign=top] td').text
#=> "Bicycle Name"

Nokogiri：用图像瞄准桌子的兄弟姐妹

2 个答案: