我编写了以下脚本来删除非字母数字字符,然后将它们取回。但是我似乎无法弄清楚为什么unhexlify
不起作用。有什么建议吗?
import binascii, timeit, re
damn_string = "asjke5234nlkfs$sfj3.$sfjk."
def convert_string(s):
return ''.join('__UTF%s__' % binascii.hexlify(c.encode('utf-16')) if not c.isalnum() else c for c in s.lower())
def convert_back(s):
for i in re.findall('__UTF([a-f0-9]{8})__', s): # For testing
print binascii.unhexlify(i).decode('utf-16')
return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
convert = convert_string(damn_string)
print convert
print convert_back(convert)
导致以下输出:
asjke5234nlkfs__UTFfffe2400__sfj3__UTFfffe2e00____UTFfffe2400__sfjk__UTFfffe2e00__
$
.
$
.
Traceback (most recent call last):
File "test.py", line 131, in <module>
print convert_back(convert)
File "test.py", line 127, in convert_back
return re.sub('__UTF([a-f0-9]{8})__', binascii.unhexlify('\g<1>').decode('utf-16'), s)
TypeError: Odd-length string
答案 0 :(得分:0)
我的坏。我花了很长时间才意识到re.sub不能以这种方式提交组字符串。一种方法是:
return re.sub('__UTF([a-f0-9]{8})__', lambda x: binascii.unhexlify(x.group(1)).decode('utf-16'), s)