test_text = "AirMail from cairnsReceived but NOTavailable at the postOFFICE"
我希望能够将共同加入的单词分开并打印与
相同的字符串print test_text
test_text = "Air Mail from cairns Received but NOT available at the post OFFICE"
我尝试了以下代码,但不能完全符合我的要求:
cleaned_text1 = re.sub(r'([A-Z][^A-Z]*)', r' \1', test_text)
print cleaned_text1
我得到以下输出:
"来自凯恩斯的航空邮件已收到,但是在O F F I C E"
答案 0 :(得分:0)
以下代码应该足以满足您的需求。
但是如果你需要拆分符号,那将是另一个故事...... 我为您附上了一些测试用例,以显示这种方式无法处理的情况。总之,它不处理尾随空格,单词和符号之间的两个或多个空格。
我仍然需要改进,因为我在splitWords函数中使用while循环来弥补正则表达式中的缺陷。
希望它有所帮助。
import re
def subFunc(matchobj):
for c in range(len(matchobj.group(0))-1):
if matchobj.group(0)[c].isupper() != matchobj.group(0)[c + 1].isupper():
return ' '.join([matchobj.group(0)[:c+1], matchobj.group(0)[c+1:]])
def splitWords(test_text):
cleaned_text1 = re.sub(r'([a-z][A-Z])|([A-Z]{2,}[a-z])', subFunc, test_text)
while test_text != cleaned_text1:
test_text = cleaned_text1
cleaned_text1 = re.sub(r'([a-z][A-Z])|([A-Z]{2,}[a-z])', subFunc, test_text)
print cleaned_text1
test_text = "AirMail from cairnsReceived but NOTavailable at the postOFFICE"
goal_text = "Air Mail from cairns Received but NOT available at the post OFFICE"
splitWords(test_text)
# Air Mail from cairns Received but NOT available at the post OFFICE
test_text = "AirMail from cairnsReceived but NOTavailable at the postOFFICEiLovePython"
splitWords(test_text)
# Air Mail from cairns Received but NOT available at the post OFFICE i Love Python
test_text = "AirMailFrOm cairnsReceived butNOTavailable at the postOFFICEiLovePYTHON"
splitWords(test_text)
# Air Mail Fr Om cairns Received but NOT available at the post OFFICE i Love PYTHON
test_text = " Air..Mail From cairnsReceived butNOTavailable at the postOFFICEiLovePYTHON"
splitWords(test_text)
# Air..Mail From cairns Received but NOT available at the post OFFICE i Love PYTHON