这是预订网站源代码的一部分:


 < script>
 booking.ensureNamespaceExists('env' );
 booking.env.b_map_center_latitude = 53.36480155016638;
 booking.env.b_map_center_longitude = -2.2752803564071655;
 booking.env.b_hotel_id ='35523';
 booking.env.b_query_params_no_ext ='?label = gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmaFCIAQGYAS64AQTIAQTYAQHoAQH4AQs; sid = e1c9e4c7a000518d8a3725b9bb6e5306; dcid = 1';
< / script>



 我想提取 booking.env.b_hotel_id
。所以我会得到'25523'的价值。我如何通过nokogiri和机械化实现这一目标?
希望有人可以提供帮助!谢谢! :)

答案 0 :(得分:6)
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.booking.com/hotel/us/solera-by-stay-alfred.html?label=gen173nr-17CAEoggJCAlhYSDNiBW5vcmVmcgV1c19ueYgBAZgBMbgBBMgBBNgBAegBAfgBAg;sid=695d6598485cb1a8fd9e39c5de3878ba;dcid=4;checkin=2015-10-20;checkout=2015-10-21;dist=0;group_adults=2;room1=A%2CA;sb_price_type=total;srfid=cf5d76283b73d34a1d7e0d61cad6974e38a94351X1;type=total;ucfs=1&')
match = agent.page.search("script").text.scan(/^booking.env.b_hotel_id = \'.*\'/)
puts match
puts match[0].split("'")[1]
输出:
booking.env.b_hotel_id = '1202411'
1202411
帮助我解决这个问题的页面:
http://robdodson.me/crawling-pages-with-mechanize-and-nokogiri/
Parsing javascript function elements with nokogiri
Regular expression - starting and ending with a character string