Question

我有一个字符串如下：

str1 = "heylisten\uff08there is something\uff09to say \uffa9"

我需要将我的正则表达式检测到的unicode值替换为两边的空格。

所需的输出字符串：

out = "heylisten \uff08 there is something \uff09 to say  \uffa9 "

我使用re.findall获取所有匹配项，然后替换它们。它看起来像：

p1 = re.findall(r'\uff[0-9a-e][0-9]', str1, flags = re.U)  
out = str1
for item in p1:
    print item
    print out
    out= re.sub(item, r" " + item + r" ", out)

输出：

'heylisten\\ uff08 there is something\\ uff09 to say \\ uffa9 '

上面有什么问题，它会打印一个额外的＆＃34; \＆＃34;并将它与uff分开？我甚至尝试使用re.search但似乎只将\uff08分开。还有更好的方法吗？

Answer 1

print re.sub(r"(\\uff[0-9a-e][0-9])", r" \1 ", x)

您可以直接使用此re.sub。见演示。

http://regex101.com/r/sU3fA2/67

import re
p = re.compile(ur'(\\uff[0-9a-e][0-9])', re.UNICODE)
test_str = u"heylisten\uff08there is something\uff09to say \uffa9"
subst = u" \1 "

result = re.sub(p, subst, test_str)

输出：

heylisten \uff08 there is something \uff09 to say  \uffa9

Answer 2

我有一个字符串如下：
str1 = "heylisten\uff08there is something\uff09to say \uffa9"
我需要替换unicode值...

您没有任何unicode值。你有一个字节串。

str1 = u"heylisten\uff08there is something\uff09to say \uffa9"
 ...
p1 = re.sub(ur'([\uff00-\uffe9])', r' \1 ', str1)

使用Regex，Python的Unicode替换

2 个答案: