我知道Ruby在从网上提取内容方面有一个非常糟糕的包装,并且会出现很多编码错误等等。如何强制将下面数组的编码强制为它的真实形式?
["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]
首先我尝试编码为UTF-8:
irb(main):012:0> data = ["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]
irb(main):013:0> data.each do |char|
irb(main):014:1* puts char.encode!("UTF-8", invalid: :replace, undef: :replace)
irb(main):015:1> end
0x4E
0x3C
0x89
0x50
0xC3
0x47
0xFF
0x70
xFF
0x2F
0xA2
0xB3
0x98
=> ["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]
所以看起来这些角色已经是UTF-8,所以接下来我尝试了ISO-8859-1:
irb(main):086:0> data.each { |char|
irb(main):087:1* puts char.encode!("iso-8859-1", invalid: :replace, undef: :replace)
irb(main):088:1> }
x4E
x3C
x89
x50
xC3
x47
xFF
x70
xFF
x2F
xA2
xB3
x98
=> ["x4E", "x3C", "x89", "x50", "xC3", "x47", "xFF", "x70", "xFF", "x2F", "xA2", "xB3", "x98"]
这也没有用,但似乎已经放弃了0
。
所以我出去了,用URI.decode
:
irb(main):093:0> require 'uri'
=> true
irb(main):094:0> data.each { |char|
irb(main):095:1* puts URI.decode(char)
irb(main):096:1> }
x4E
x3C
x89
x50
xC3
x47
xFF
x70
xFF
x2F
xA2
xB3
x98
=> ["x4E", "x3C", "x89", "x50", "xC3", "x47", "xFF", "x70", "xFF", "x2F", "xA2", "xB3", "x98"]
你不知道吗?它没有用。
有没有办法让角色恢复原状?如果它有帮助,这来自一个URL,我没有完整的URL。
答案 0 :(得分:1)
你的数组
["0x4E", "0x3C", "0x89", "0x50", "0xC3", "0x47", "0xFF", "0x70", "xFF", "0x2F", "0xA2", "0xB3", "0x98"]
是一个字符串数组,每个字符串有四个字符。第一个字符串是“0x4E”(零,小x,4和E)
可能你想检查一个十六进制值数组,如:
data = [0x4E, 0x3C, 0x89, 0x50, 0xC3, 0x47, 0xFF, 0x70, 0xFF, 0x2F, 0xA2, 0xB3, 0x98]
要获取字符值,您可以使用Integer#chr
:
p data.map{|c|c.chr} #-> ["N", "<", "\x89", "P", "\xC3", "G", "\xFF", "p", "\xFF", "/", "\xA2", "\xB3", "\x98"]
此字符可以“编码”:
p data.map { |char|
char.chr.encode('utf-8', invalid: :replace, undef: :replace)
} #["N", "<", "\uFFFD", "P", "\uFFFD", "G", "\uFFFD", "p", "\uFFFD", "/", "\uFFFD", "\uFFFD", "\uFFFD"]
p data.map { |char|
char.chr.encode('iso-8859-1', invalid: :replace, undef: :replace)
} #["N", "<", "?", "P", "?", "G", "?", "p", "?", "/", "?", "?", "?"]