Question

我需要有关python中的中文字符的正则表达式问题的帮助。

“拉柏多公园”是这个词的正确形式，但在文本中我找到了“拉柏多公园”，我应该用什么正则表达式替换这些字符。

import re

name = "拉柏多公园"
line = "whatever whatever it is then there comes a 拉柏 多公 园 sort of thing"
line2 = "whatever whatever it is then there comes another拉柏 多公 园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏 多公 园sort of thing"
line4 = "whatever whatever it is then there comes a拉柏 多公 园sort of thing"

firstchar = "拉"
lastchar = "园"

我需要替换行中的字符串，以便输出行看起来像这样

line = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"
line2 = "whatever whatever it is then there comes another 拉柏多公园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏多公园 sort of thing"
line4 = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"

我尝试过这些，但是正则表达式的结构很糟糕：

reline = line.replace (r"firstchar*lastchar", name) #
reline2 = reline.replace ("  ", " ")
print reline2

有人可以帮助纠正我的正则表达式吗？

由于

Answer 1

（我假设你使用的是python 3，因为你在常规字符串中使用unicode字符。对于python 2，在每个字符串文字之前添加u。）

Python 3

import re

name = "拉柏多公园"
# the string of Chinese characters, with any number of spaces interspersed.
# The regex will match any surrounding spaces.
regex = r"\s*拉\s*柏\s*多\s*公\s*园\s*"

所以你可以用

替换每个字符串

reline = re.sub(regex, ' ' + name + ' ', line)

Python 2

# -*- coding: utf-8 -*-

import re

name = u"拉柏多公园"
# the string of Chinese characters, with any number of spaces interspersed.
# The regex will match any surrounding spaces.
regex = ur"\s*拉\s*柏\s*多\s*公\s*园\s*"

所以你可以用

替换每个字符串

reline = re.sub(regex, u' ' + name + u' ', line)

讨论

结果将被空格包围。更一般地说，如果您希望它在行的开头或结尾，或在逗号或句点之前工作，则必须用更复杂的东西替换' ' + name + ' '。

编辑：修复。当然，您必须使用re库函数。

正则表达式替换正确的间距

1 个答案:

Python 3

Python 2

讨论