使用regex和python排列文本文件输出regex.sub

时间:2019-04-07 13:08:56

标签: python regex file text

import re
import fileinput
import re

#regex used
#result = re.split('(?<=\S)[^-][ ](?=[a-zA-Z0-9])', line)

<----这写在多行上,但是一个字符在多行上丢失了,而且不是很正确,所以我进行了很多搜索,不得不像下面这样对“ $”做广告:

result = re.split('(?<=\S$)[^-][ ](?=[a-zA-Z0-9])', line) <----这是一个很好的结果,但是现在我把一些不分开的词粘在一起,我知道大写字母之前的字母 例如“ ***** J”,我需要先拥有“ ***** J”,然后在换行符上取回类似Sentence1 Sentence2 Sentence2的字母,然后我就完成了!我在使用re.sub时遇到麻烦,然后将 全部都放在新行上,例如我想要最终输出。

line = "WordsAreStickedTogetherHereIneedOneSpaceBetweeeThem"

result = re.split('(?<=\S$)[^-][ ](?=[a-zA-Z0-9])', line)

final_result = re.sub('dM','d M',result)
final_result = re.sub('dJ','d J',result)

for elem in final_result:
        print elem


ERRROR:

$python main.py
Traceback (most recent call last):
  File "main.py", line 22, in <module>
    final_result = re.sub('dC','d C',result)
  File "/usr/lib64/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

1 个答案:

答案 0 :(得分:3)

如果只需要拆分单词(一个单词是大写字母,然后是小写字母),则可以简单地使用re.finditer

line = "WordsAreStickedTogetherHereINeedOneSpaceBetweeeThem"
matches = re.finditer("[A-Z][a-z]*", line)
new_line = " ".join(match.group() for match in matches)

变量new_line包含:

>>> print(new_line)
'Words Are Sticked Together Here I Need One Space Betweee Them'