Question

当从外部来源（例如我的银行），通过csv文件收集包含英镑符号'£'的信息，并使用ActiveRecord发布到postgres时，我收到错误：

PG :: CharacterNotInRepertoire：错误：编码“UTF8”的无效字节序列：0xa3

0xa3是£符号的十六进制代码。感知的智慧是在字符串上清楚地指定UTF-8，同时替换无效的字节序列。

string.encode('UTF-8', {:invalid => :replace, :undef => :replace, :replace => '?'})

这会停止错误，但是因为'£'被转换为'？'而成为有损修复

UTF-8能够处理'£'符号，那么可以采取哪些措施来修复无效字节序列并保留'£'符号？

Answer 1

我正在回答我自己的问题，感谢Michael Fuhr，他解释了UTF-8 byte sequence的英镑符号是0xc2 0xa3。所以，你要做的就是找到每次出现的0xa3（163）并将0xc2（194）放在它前面......

array_bytes = string.bytes
new_pound_ptr = 0
# Look for £ sign 
pound_ptr = array_bytes.index(163)
while !pound_ptr.nil?
  pound_ptr+= new_pound_ptr # new_pound_ptr is set at end of block
  # The following statement finds incorrectly sequenced £ sign...
  if (pound_ptr == 0) || (array_bytes[pound_ptr-1] != 194)
    array_bytes.insert(pound_ptr,194)
      pound_ptr+= 1
    end
    new_pound_ptr = pound_ptr
    # Search remainder of array for pound sign
    pound_ptr = array_bytes[(new_pound_ptr+1)..-1].index(163)
  end
end
# Convert bytes to 8-bit unsigned char, and UTF-8
string = array_bytes.pack('C*').force_encoding('UTF-8') unless new_pound_ptr == 0
# Can now write string to model without out-of-sequence error..
hash["description"] = string
Model.create!(hash)

我在这个stackoverflow论坛上得到了很多帮助，我希望我能帮助其他人。

英镑符号£导致PG :: CharacterNotInRepertoire：错误：编码“UTF8”的无效字节序列：0xa3

1 个答案: