我正在使用Mechanize
Ruby gem来抓取epinions.com的一些内容。但不知何故,有些链接没有被正确解释。这是由Mechanize用~
替换‾
引起的。结果是Mechanize无法点击链接。
不成功的例子,然后是成功的刮擦:
# script
agent = Mechanize.new
page_1 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-AtomicPark_com/display_~reviews")
puts page_1.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect
page_2 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-Vanns_com/display_~reviews")
puts page_2.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect
# result
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-AtomicPark_com/display_‾full_specs">
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">
知道为什么会这样吗?
答案 0 :(得分:0)
这对我来说很好用:
[14:29] arkham ~/Desktop [2.1.0]
↳ $ ruby mechanize.rb
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-AtomicPark_com/display_~full_specs">
#<Mechanize::Page::Link
"View Information"
"/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">
您使用的是哪个版本的红宝石?