当从网页读取链接的href时,Mechanize用`~`替换`~`

时间:2014-01-12 11:51:31

标签: ruby web-scraping screen-scraping mechanize

我正在使用Mechanize Ruby gem来抓取epinions.com的一些内容。但不知何故,有些链接没有被正确解释。这是由Mechanize用~替换引起的。结果是Mechanize无法点击链接。

不成功的例子,然后是成功的刮擦:

# script

agent = Mechanize.new

page_1 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-AtomicPark_com/display_~reviews")
puts page_1.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect

page_2 = agent.get("http://www.epinions.com/webs-Web_Services-All-Merchants-Vanns_com/display_~reviews")
puts page_2.links_with(:href => /full_specs/, :text => /^View Information$/).last.inspect

# result

#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-AtomicPark_com/display_‾full_specs">
#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">

知道为什么会这样吗?

1 个答案:

答案 0 :(得分:0)

这对我来说很好用:

[14:29] arkham ~/Desktop [2.1.0]
↳ $ ruby mechanize.rb
#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-AtomicPark_com/display_~full_specs">
#<Mechanize::Page::Link
 "View Information"
 "/webs-Web_Services-All-Merchants-Vanns_com/display_~full_specs">

您使用的是哪个版本的红宝石?