在Net :: HTTP.get_response.body中使用非拉丁(西里尔)符号将ASCII-8bit转换为UTF-8

时间:2012-07-28 12:33:30

标签: ruby-on-rails 8-bit

我需要通过Net :: HTTP获取一些数据,我收到的是ASCII-8bit的响应。问题是如何将其编码为utf8并保存所有非拉丁符号?

@content.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace, :replace => '')我松开所有西里尔符号

@content.encode('utf-8', 'binary')我收到"\xCB" from ASCII-8BIT to UTF-8错误

使用@content.force_encoding("UTF-8)我得到 而不是西里尔符号

我无法通过谷歌搜索找到答案。

1 个答案:

答案 0 :(得分:3)

问题通过

解决
begin
    cleaned = response.body.dup.force_encoding('UTF-8')
    unless cleaned.valid_encoding?
       cleaned = response.body.encode( 'UTF-8', 'Windows-1251' )
    end
    content = cleaned
rescue EncodingError
    content.encode!( 'UTF-8', invalid: :replace, undef: :replace )
end

here is more complete data