UTF-8中的无效字节序列

时间:2013-02-14 14:10:37

标签: ruby

我使用https://github.com/ging/linkser加载网址:

1.9.3p374 :001 > require 'linkser'
 => true 
1.9.3p374 :002 > l = Linkser.parse 'http://sports.163.com/nba/'
 => #<Linkser::Objects::HTML:0x007f92019e99c8 @url="http://sports.163.com/nba/", @last_url="http://sports.163.com/nba/", @head=#<Net::HTTPOK 200 OK readbody=true>, @options={}> 
1.9.3p374 :003 > l.title
encoding error : input conversion failed due to input error, bytes 0xC4 0x4E 0x42 0x41
 => "NBA,NBAֱҥ,\xD7钭ㄒ档" 

是否可以将字节序列转换为正确的utf8字符串?

1 个答案:

答案 0 :(得分:0)

页面的实际编码是GBK,即gb2312。快速浏览一下linkser source显示没有处理编码,所以留给Net :: HTTP,它有一个long standing bug about that,针对Ruby 2.0.0。