我想从这张桌子上获取物品:
<table style="margin: auto;width: 800px" id="myTable" class="tablesorter">
<thead>
<tr class="TableHeader">
<th >Game</th><th>Icon</th><th>Achievement</th>
<th>Achievers</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="Steam_Game_Info.php?AppID=440"><img alt="Logo" src="http://cdn.akamai.steamstatic.com/steamcommunity/public/images/apps/440/07385eb55b5ba974aebbe74d3c99626bda7920b8.jpg" width=133 height=50 ></a></td>
<td> <table>
<tr>
<td class="AchievementBox" style="background-color: #347C17">
<a href="Steam_Achievement_Info.php?AchievementID=169&AppID=440"> <img alt="Icon" src="http://cdn.akamai.steamstatic.com/steamcommunity/public/images/apps/440/924764eea604817d3c14de9640ae6422c7cdfb7a.jpg" height='50' width='50'>
</a> </td>
</tr>
</table>
</td>
<td style="text-align: left" ><a href="Steam_Achievement_Info.php?AchievementID=169&AppID=440">Race for the Pennant</a><br>Run 25 kilometers.</td>
<td style="text-align: right">35505</td><td style="text-align: right">1.3</td>
该表的ID为myTable
,所以我想做的是:
go inside <tbody>
for each <tr> in table:
do something; maybe go inside <td> or get a link from <href>
我有:
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://astats.astats.nl/astats/TopListAchievements.php?DisplayType=2")
puts page.body
这会打印页面,但我如何实际遍历表格行?
答案 0 :(得分:2)
使用css选择器打印文本和href属性值:
require 'nokogiri'
doc = Nokogiri::HTML(page.body)
doc.css('table#myTable tbody td[3] a').each {|a|
puts a.text, a[:href]
}