我需要有关python中的中文字符的正则表达式问题的帮助。
“拉柏多公园”是这个词的正确形式,但在文本中我找到了“拉柏多公园”,我应该用什么正则表达式替换这些字符。
import re
name = "拉柏多公园"
line = "whatever whatever it is then there comes a 拉柏 多公 园 sort of thing"
line2 = "whatever whatever it is then there comes another拉柏 多公 园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏 多公 园sort of thing"
line4 = "whatever whatever it is then there comes a拉柏 多公 园sort of thing"
firstchar = "拉"
lastchar = "园"
我需要替换行中的字符串,以便输出行看起来像这样
line = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"
line2 = "whatever whatever it is then there comes another 拉柏多公园 sort of thing"
line3 = "whatever whatever it is then there comes yet another 拉柏多公园 sort of thing"
line4 = "whatever whatever it is then there comes a 拉柏多公园 sort of thing"
我尝试过这些,但是正则表达式的结构很糟糕:
reline = line.replace (r"firstchar*lastchar", name) #
reline2 = reline.replace (" ", " ")
print reline2
有人可以帮助纠正我的正则表达式吗?
由于
答案 0 :(得分:4)
(我假设你使用的是python 3,因为你在常规字符串中使用unicode字符。对于python 2,在每个字符串文字之前添加u
。)
import re
name = "拉柏多公园"
# the string of Chinese characters, with any number of spaces interspersed.
# The regex will match any surrounding spaces.
regex = r"\s*拉\s*柏\s*多\s*公\s*园\s*"
所以你可以用
替换每个字符串reline = re.sub(regex, ' ' + name + ' ', line)
# -*- coding: utf-8 -*-
import re
name = u"拉柏多公园"
# the string of Chinese characters, with any number of spaces interspersed.
# The regex will match any surrounding spaces.
regex = ur"\s*拉\s*柏\s*多\s*公\s*园\s*"
所以你可以用
替换每个字符串reline = re.sub(regex, u' ' + name + u' ', line)
结果将被空格包围。更一般地说,如果您希望它在行的开头或结尾,或在逗号或句点之前工作,则必须用更复杂的东西替换' ' + name + ' '
。
编辑:修复。当然,您必须使用re
库函数。