这可能只是一个语法问题。
我不清楚如何仅匹配id以rowId _
开头的表行agent = Mechanize.new
pageC1 = agent.get("/customStrategyScreener!list.action")
该表有class = tableCellDT。
pageC1.search('table.tableCellDT tr[@id=rowId_]') # parses OK but returns 0 rows since rowId_ is not matched exactly.
pageC1.search('table.tableCellDT tr[@id=rowId_*]') # Throws an error since * is not treated like a wildcard string match
示例HTML:
<table id="row" cellpadding="5" class="tableCellDT" cellspacing="1">
<thead>
<tr>
<th class="tableHeaderDT">#</th>
<th class="tableHeaderDT sortable">
<a href="?d-16544-s=1&d-16544-o=2&d-16544-p=1">Screener</a></th>
<th class="tableHeaderDT sortable">
<a href="?d-16544-s=2&d-16544-o=2&d-16544-p=1">Strategy</a></th>
<th class="tableHeaderDT"> </th></tr></thead>
<tbody>
<tr id="rowId_BullPut" class="odd">
<td> 1 </td>
<td> Bull</td>
<td></td>
<td><a href="link1?model.itemId=2262">Edit</a>
<a href="javascript:deleteScreener('link2?model.itemId=2262');">Delete</a>
<a href="link3?model.itemId=2262&amp;model.source=list">View</a>
</td></tr>
请注意
pageC1 是Mechanize::Page
个对象,而不是Nokogiri
任何内容。抱歉,一开始并不清楚。
Mechanize :: Page没有#css或#xpath方法,但可以从中提取Nokogiri doc(无论如何在内部使用)。
答案 0 :(得分:2)
要获取tr
元素,其中id
以“rowId _”开头:
pageC1.search('//tr[starts-with(@id, "rowId_")]')
答案 1 :(得分:1)
你想要CSS3 attribute starts-with selector:
pageC1.css('table.tableCellDT tr[id^="rowId_"]')
或XPath starts-with()
函数:
pageC1.xpath('.//table[@class="tableCellDT"]//tr[starts-with(@id,"rowId_")]')
虽然Nokogiri Node#search
方法会根据您编写的内容智能地在CSS或XPath选择器语法之间进行选择,但这并不意味着您可以在同一查询中混合使用CSS和XPath选择器语法。
行动中:
>> require 'nokogiri'
#=> true
>> doc = Nokogiri.HTML <<ENDHTML; true #hide output from IRB
">> <table class="foo"><tr id="rowId_nonono"><td>Nope</td></tr></table>
">> <table class="tableCellDT">
">> <tr id="rowId_yesyes"><td>Yes1</td></tr>
">> <tr id="rowId_andme2"><td>Yes2</td></tr>
">> <tr id="rowIdNONONO"><td>Needs underscore</td></tr>
">> </table>
">> ENDHTML
#=> true
>> doc.css('table.tableCellDT tr[id^="rowId_"]').map(&:text)
#=> ["Yes1", "Yes2"]
>> doc.xpath('.//table[@class="tableCellDT"]//tr[starts-with(@id,"rowId_")]').map(&:text)
#=> ["Yes1", "Yes2"]
答案 2 :(得分:0)
谢谢 http://nokogiri.org/Nokogiri/XML/Node.html#method-i-css
以及上面的答案,这里是最终的代码,它解决了我只需要获取所需行的问题,然后只读取每个行中的某些信息:
pageC1.search('//tr[starts-with(@id, "rowId_")]').each do |row|
# Read the string after _ in rowId_, part of the "id" in <tr>
rid = row.attribute("id").text.split("_")[1] # => "BullPut"
# Get the URL of the 3rd <a> link in <td> cell 4
link = row.css("td[4] a[3]")[0].attributes["href"].text # => "link3?model.itemId=2262&amp;model.source=list"
end