import re
import fileinput
import re
#regex used
#result = re.split('(?<=\S)[^-][ ](?=[a-zA-Z0-9])', line)
<----这写在多行上,但是一个字符在多行上丢失了,而且不是很正确,所以我进行了很多搜索,不得不像下面这样对“ $”做广告:
result = re.split('(?<=\S$)[^-][ ](?=[a-zA-Z0-9])', line)
<----这是一个很好的结果,但是现在我把一些不分开的词粘在一起,我知道大写字母之前的字母
例如“ ***** J”,我需要先拥有“ ***** J”,然后在换行符上取回类似Sentence1 Sentence2 Sentence2的字母,然后我就完成了!我在使用re.sub时遇到麻烦,然后将
全部都放在新行上,例如我想要最终输出。
line = "WordsAreStickedTogetherHereIneedOneSpaceBetweeeThem"
result = re.split('(?<=\S$)[^-][ ](?=[a-zA-Z0-9])', line)
final_result = re.sub('dM','d M',result)
final_result = re.sub('dJ','d J',result)
for elem in final_result:
print elem
ERRROR:
$python main.py
Traceback (most recent call last):
File "main.py", line 22, in <module>
final_result = re.sub('dC','d C',result)
File "/usr/lib64/python2.7/re.py", line 155, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer
答案 0 :(得分:3)
如果只需要拆分单词(一个单词是大写字母,然后是小写字母),则可以简单地使用re.finditer
:
line = "WordsAreStickedTogetherHereINeedOneSpaceBetweeeThem"
matches = re.finditer("[A-Z][a-z]*", line)
new_line = " ".join(match.group() for match in matches)
变量new_line
包含:
>>> print(new_line)
'Words Are Sticked Together Here I Need One Space Betweee Them'