在使用Mechanize进行抓取时,我总是在Ruby 2.0中获得UndefinedConversionError

时间:2013-09-01 14:17:49

标签: ruby encoding utf-8 mechanize iconv

当我尝试使用Mechanize和Ruby 2.0提交textarea时,我总是得到一个

Encoding::UndefinedConversionError: U+0151 from UTF-8 to ISO-8859-1

然后我尝试用Iconv转换文本,我得到了类似的结果:

Iconv.iconv("LATIN1", "UTF-8", text)

我收到此错误消息:

Iconv::IllegalSequence: "őzködik, melyet "...

由于文字包含东欧字符。我该怎么做才能避免这种不便或如何在不同的编码之间正确转换?

1 个答案:

答案 0 :(得分:0)

我找到了一个优雅的解决方案:

replacements = [["À", "À"], ["Á", "Á"], ["Â", "Â"], ["Ã", "Ã"], ["Ä", "Ä"], ["Å", "Å"], ["Æ", "Æ"], ["Ç", "Ç"], ["È", "È"], ["É", "É"], ["Ê", "Ê"], ["Ë", "Ë"], ["Ì", "Ì"], ["Í", "Í"], ["Î", "Î"], ["Ï", "Ï"], ["Ð", "Ð"], ["Ñ", "Ñ"], ["Ò", "Ò"], ["Ó", "Ó"], ["Ô", "Ô"], ["Õ", "Õ"], ["Ö", "Ö"], ["Ø", "Ø"], ["Ù", "Ù"], ["Ú", "Ú"], ["Û", "Û"], ["Ü", "Ü"], ["Ý", "Ý"], ["Þ", "Þ"], ["ß", "ß"], ["à", "à"], ["á", "á"], ["â", "â"], ["ã", "ã"], ["ä", "ä"], ["å", "å"], ["æ", "æ"], ["ç", "ç"], ["è", "è"], ["é", "é"], ["ê", "ê"], ["ë", "ë"], ["ì", "ì"], ["í", "í"], ["î", "î"], ["ï", "ï"], ["ð", "ð"], ["ñ", "ñ"], ["ò", "ò"], ["ó", "ó"], ["ô", "ô"], ["õ", "õ"], ["ö", "ö"], ["ø", "ø"], ["ù", "ù"], ["ú", "ú"], ["û", "û"], ["ü", "ü"], ["ý", "ý"], ["þ", "þ"], ["ÿ", "ÿ"]]

def replace(str,replacements)
 replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
 return str
end

my_string=replace(my_string,replacements)