当从外部来源(例如我的银行),通过csv文件收集包含英镑符号'£'的信息,并使用ActiveRecord发布到postgres时,我收到错误:
PG :: CharacterNotInRepertoire:错误:编码“UTF8”的无效字节序列:0xa3
0xa3是£符号的十六进制代码。感知的智慧是在字符串上清楚地指定UTF-8,同时替换无效的字节序列。
string.encode('UTF-8', {:invalid => :replace, :undef => :replace, :replace => '?'})
这会停止错误,但是因为'£'被转换为'?'而成为有损修复
UTF-8能够处理'£'符号,那么可以采取哪些措施来修复无效字节序列并保留'£'符号?
答案 0 :(得分:2)
我正在回答我自己的问题,感谢Michael Fuhr,他解释了UTF-8 byte sequence的英镑符号是0xc2 0xa3。所以,你要做的就是找到每次出现的0xa3(163)并将0xc2(194)放在它前面......
array_bytes = string.bytes
new_pound_ptr = 0
# Look for £ sign
pound_ptr = array_bytes.index(163)
while !pound_ptr.nil?
pound_ptr+= new_pound_ptr # new_pound_ptr is set at end of block
# The following statement finds incorrectly sequenced £ sign...
if (pound_ptr == 0) || (array_bytes[pound_ptr-1] != 194)
array_bytes.insert(pound_ptr,194)
pound_ptr+= 1
end
new_pound_ptr = pound_ptr
# Search remainder of array for pound sign
pound_ptr = array_bytes[(new_pound_ptr+1)..-1].index(163)
end
end
# Convert bytes to 8-bit unsigned char, and UTF-8
string = array_bytes.pack('C*').force_encoding('UTF-8') unless new_pound_ptr == 0
# Can now write string to model without out-of-sequence error..
hash["description"] = string
Model.create!(hash)
我在这个stackoverflow论坛上得到了很多帮助,我希望我能帮助其他人。