我使用https://github.com/ging/linkser加载网址:
1.9.3p374 :001 > require 'linkser'
=> true
1.9.3p374 :002 > l = Linkser.parse 'http://sports.163.com/nba/'
=> #<Linkser::Objects::HTML:0x007f92019e99c8 @url="http://sports.163.com/nba/", @last_url="http://sports.163.com/nba/", @head=#<Net::HTTPOK 200 OK readbody=true>, @options={}>
1.9.3p374 :003 > l.title
encoding error : input conversion failed due to input error, bytes 0xC4 0x4E 0x42 0x41
=> "NBA,NBAֱҥ,\xD7钭ㄒ档"
是否可以将字节序列转换为正确的utf8字符串?
答案 0 :(得分:0)
页面的实际编码是GBK,即gb2312。快速浏览一下linkser source显示没有处理编码,所以留给Net :: HTTP,它有一个long standing bug about that,针对Ruby 2.0.0。