<select class="exampleSelect">
<option></option>
<option value="test">Want event to fire</option>
</select>
我正在尝试搜索“最后销售”,“销售日期”和“目前待售”值的数据,除了内部的所有内容
<div class="seperate">
<h2>Public info</h2>
<p>
<strong>Property type:</strong> Semi-detached house |
<strong>Tenure:</strong> Leasehold |
<strong>Last sale:</strong> £71,000 | <strong>Sale date:</strong> 5th Dec 2007 - <a href="" class="toggle_sold_prices">Previous sales</a>
<span id="sold-prices" class="none">
<br>
<strong>Property type:</strong>
Semi-detached house |
<strong>Tenure:</strong>
Leasehold |
<strong>Previous sale:</strong> £75,000 |
<strong>Sale date:</strong>
3rd Oct 2006
<br>
<strong>Property type:</strong>
Semi-detached house |
<strong>Tenure:</strong>
Leasehold |
<strong>Previous sale:</strong> £36,000 |
<strong>Sale date:</strong>
26th Sep 2002
<br>
<strong>Property type:</strong>
Semi-detached house |
<strong>Tenure:</strong>
Leasehold |
<strong>Previous sale:</strong> £39,950 |
<strong>Sale date:</strong>
27th Jan 1995
<span class="new-build">New build</span>
</span>
| <a href="/for-sale/details/42175871"><i class="icon icon-home nolink"></i>Currently for sale</a>
</p>
</div>
我知道我可以做到
<span id="sold-prices" class="none">
将HTML放在单独的div中,但我不知道如何抓取我想要的标签的数据。有什么想法吗?
答案 0 :(得分:2)
在Nokogiri完成处理HTML之后,它很容易找到并操纵节点。有时这意味着有选择地删除节点以简化DOM。这是其中一次:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div class="seperate">
<p>
<strong>Property type:</strong> Semi-detached house |
<strong>Tenure:</strong> Leasehold |
<strong>Last sale:</strong> £71,000 | <strong>Sale date:</strong> 5th Dec 2007 - <a href="" class="toggle_sold_prices">Previous sales</a>
<span id="sold-prices" class="none">
<br>
<strong>Property type:</strong>
Semi-detached house |
<strong>Tenure:</strong>
Leasehold |
</span>
</p>
</div>
EOT
doc.at('#sold-prices').remove
data = doc.search('strong').map{ |strong|
[strong.text, strong.next_sibling.text.tr('|', '').strip]
}.to_h
data # => {"Property type:"=>"Semi-detached house", "Tenure:"=>"Leasehold", "Last sale:"=>"£71,000", "Sale date:"=>"5th Dec 2007 -"}
诀窍是:
doc.at('#sold-prices').remove
摆脱了森林,所以你可以看到你想要的树木。
需要更多的清理结果数据,但其余的代码应该是不言自明的,所以调整它应该很容易。