Question

我有一个字符串：

alphabet = unicode("abś", 'utf-8');

如何将其转换为：

"ab\\u015B";

？我试过那段代码：

al = ""
for char in alphabet:
    if ord(char) > 127:
       al += "\\u" + format(ord(char), 'x')
    else:
       al += char

但是当我想从那里制作正则表达式时，它并没有匹配正确的字符：

abc = re.compile(u'[' + al + u']{1,}$', re.U).match

演示在这里：http://ideone.com/q55HI8

Answer 1

>>> u"abś".encode('unicode-escape')
'ab\\u015b'

我认为你可以得到你想要的东西而不需要改变原始的unicode字符串：

>>> abc = re.compile(u"[abś]{1,}$", re.U).match
>>> abc(u"ś")
<_sre.SRE_Match object at 0x89f31a8>
>>> abc(u"ś").group()
u'\u015b'
>>> abc(u"a").group()
u'a'
>>> abc("x") is None
True

使用unicode字符从字符串创建正则表达式

1 个答案: