Question

我编写了以下脚本来删除非字母数字字符，然后将它们取回。但是我似乎无法弄清楚为什么unhexlify不起作用。有什么建议吗？

import binascii, timeit, re

damn_string = "asjke5234nlkfs$sfj3.$sfjk."

def convert_string(s):
    return ''.join('__UTF%s__' % binascii.hexlify(c.encode('utf-16')) if not c.isalnum() else c for c in s.lower())

def convert_back(s):
    for i in re.findall('__UTF([a-f0-9]{8})__', s): # For testing
        print binascii.unhexlify(i).decode('utf-16')
    return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)

convert = convert_string(damn_string)
print convert
print convert_back(convert)

导致以下输出：

asjke5234nlkfs__UTFfffe2400__sfj3__UTFfffe2e00____UTFfffe2400__sfjk__UTFfffe2e00__
$
.
$
.
Traceback (most recent call last):
  File "test.py", line 131, in <module>
    print convert_back(convert)
  File "test.py", line 127, in convert_back
    return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
TypeError: Odd-length string

Answer 1

我的坏。我花了很长时间才意识到re.sub不能以这种方式提交组字符串。一种方法是：

return re.sub('__UTF([a-f0-9]{8})__', lambda x: binascii.unhexlify(x.group(1)).decode('utf-16'), s)

Binascii.unhexlify typeerror奇数长度字符串

1 个答案: