Question

我试图从unicode字符串中获取shift-jis字符代码。我在python中并不是那么知道，但是到目前为止我已经尝试了这个：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *

data="臍"
udata=data.decode("utf-8")
data=udata.encode("shift-jis").decode("shift-jis")
code=unpack(data, "Q")
print code

但我收到UnicodeEncodeError: 'ascii' codec can't encode character u'\u81cd' in position 0: ordinal not in range(128)错误。字符串始终是单个字符。

Answer 1

该字符在shift-jis中表示为 two 字节序列0xE4和0x60：

>>> data = u'\u81cd'
>>> data_shift_jis = data.encode('shift-jis')
'\xe4`'
>>> hex(ord('`'))
0x60

所以'\xe4\x60'被u'\u81cd'编码为shift-jis。

Answer 2

在python 2中，当你创建一个utf-8编码的字符串时，你可以保留编码（data =“脐”），或者你可以在解析程序时将python解码为unicode字符串（` data = u“脐”）。第二个选项是在源文件是utf-8编码时创建字符串的常规方法。

当您尝试转换为JIS时，您最终将JIS解码为python unicode字符串。当你试图打开包装时，当你真的想要“H”（无符号的短）时，你会问“Q”（长期不成长）。

以下是获取角色信息的两个样本

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from struct import *

# here we have an "ascii" string that is really utf-8 encoded char
data="臍"
jis_data = data.decode('utf-8').encode("shift-jis")
code = unpack(">H", jis_data)[0]
print repr(data), repr(jis_data), hex(code)[2:]

# here python decodes the utf-8 encoded char for us
data=u"臍"
jis_data = data.encode("shift-jis")
code = unpack(">H", jis_data)[0]
print repr(data), repr(jis_data), hex(code)[2:]

结果是

'\xe8\x87\x8d' '\xe4`' 58464 0xe460
u'\u81cd' '\xe4`' 58464 0xe460

从字符串中获取特定编码的字符代码

2 个答案: