我有这个HTML,请注意所有内容都嵌套在.listing
div:
<div id="listing_1085130_featured" class="item listing 1085130 even featured selected" data-blockindex="0" se:map:point="40.7219,-74.0034" se:map="map" se:behavior="selectable hoverable rememberable clickable mappable" style="cursor: pointer;">
<div class="item_inner ">
<div class="featured_tag hidden-xs">Featured Listing</div>
<div class="selected_marker hidden-xs hidden-sm">
<div id="results_list" class="photo">
<a href="/building/27-wooster/ph?featured=1">
<img border="0" src="https://s3.amazonaws.com/img.streeteasy.com/nyc/image/47/76017947.jpg" alt="27 Wooster Street #PH">
</a>
<div id="featured-tag-on-responsive" class="visible-xs">Featured Listing</div>
</div>
<div class="details">
<div class="details_title">
<h5>
<a se:clickable:target="true" href="/building/27-wooster/ph?featured=1">27 Wooster Street #PH</a>
</h5>
<div class="item_tools">
</div>
<div class="closer"></div>
<div class="details_info first_detail_info">
<div class="details_info">
<div class="details_info">
<div class="details_info">
</div>
<div class="closer"></div>
</div>
</div>
....
我有很多这些,如何获取#results_list
内第一个链接的href,在这种情况下为/building/27-wooster/ph?featured=1
。
到目前为止,这是我的方法:
require 'json'
require 'open-uri'
require 'nokogiri'
def scrape(page_number)
doc = Nokogiri::HTML(open("http://streeteasy.com/for-sale/soho?page=#{page_number}sort_by=price_desc"))
doc.css(".listing").each do |listing|
# grab data inside that specific listing
end
end
有没有办法查看该列表?比如listing.children("#results_list a").first.href
答案 0 :(得分:0)
这对我有用:
doc.css("#results_list/a").each do |listing|
p listing['href']
end
要获得第一个列表,使用at_css,用这一行替换上面的代码应该会产生相同的结果:
doc.at_css("#results_list/a")['href']
答案 1 :(得分:0)
有没有办法查看该列表?
是的,但是在html中,id必须是页面唯一的,因此您怀疑所有的.listing div都包含一个id =&#34; results_list&#34;的div。但是,nokogiri似乎没有多个相同ID的问题:
require 'nokogiri'
html = <<'END_OF_HTML'
<div class="item listing 1085130 even featured selected">
<div>
<div id="results_list" class="photo">
<a href="/building/27-wooster/ph?featured=1">hello</a>
<a href="#">apple</a>
</div>
</div>
</div>
<div class="item listing 1085131 even featured selected">
<div>
<div id="results_list" class="photo">
<a href="/building/27-wooster/ph?featured=1">world</a>
<a href="#">cherry</a>
</div>
</div>
</div>
<div class="item listing 1085132 even featured selected">
<div>
<div id="results_list" class="photo">
<a href="/building/27-wooster/ph?featured=1">goodbye</a>
<a href="#">peach</a>
</div>
</div>
</div>
END_OF_HTML
doc = Nokogiri::HTML(html)
doc.css(".listing").each do |div|
a_tag = div.at_xpath('.//div[@id="results_list"]/a')
puts a_tag.text
end
--output:--
hello
world
goodbye
at_xpath()
搜索第一个匹配元素
.//
在当前元素中进行搜索