正则表达式替换正确的间距

时间:2011-08-03 03:36:53

标签: python regex replace str-replace cjk

我需要有关python中的中文字符的正则表达式问题的帮助。

“拉柏多公园”是这个词的正确形式,但在文本中我找到了“拉柏多公园”,我应该用什么正则表达式替换这些字符。

import re

name = "拉柏多公园"
line = "whatever whatever it is then there comes a 拉柏 多公 园 sort of thing"
line2 = "whatever whatever it is then there comes another拉柏 多公 园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏 多公 园sort of thing"
line4 = "whatever whatever it is then there comes a拉柏 多公 园sort of thing"

firstchar = "拉"
lastchar = "园"

我需要替换行中的字符串,以便输出行看起来像这样

line = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"
line2 = "whatever whatever it is then there comes another 拉柏多公园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏多公园 sort of thing"
line4 = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"

我尝试过这些,但是正则表达式的结构很糟糕:

reline = line.replace (r"firstchar*lastchar", name) #
reline2 = reline.replace ("  ", " ")
print reline2

有人可以帮助纠正我的正则表达式吗?

由于

1 个答案:

答案 0 :(得分:4)

(我假设你使用的是python 3,因为你在常规字符串中使用unicode字符。对于python 2,在每个字符串文字之前添加u。)

Python 3

import re

name = "拉柏多公园"
# the string of Chinese characters, with any number of spaces interspersed.
# The regex will match any surrounding spaces.
regex = r"\s*拉\s*柏\s*多\s*公\s*园\s*"

所以你可以用

替换每个字符串
reline = re.sub(regex, ' ' + name + ' ', line)

Python 2

# -*- coding: utf-8 -*-

import re

name = u"拉柏多公园"
# the string of Chinese characters, with any number of spaces interspersed.
# The regex will match any surrounding spaces.
regex = ur"\s*拉\s*柏\s*多\s*公\s*园\s*"

所以你可以用

替换每个字符串
reline = re.sub(regex, u' ' + name + u' ', line)

讨论

结果将被空格包围。更一般地说,如果您希望它在行的开头或结尾,或在逗号或句点之前工作,则必须用更复杂的东西替换' ' + name + ' '

编辑:修复。当然,您必须使用re库函数。