Question

我正在使用一个文本，其中所有“\ n”已被删除（将两个单词合并为一个，例如“我喜欢香蕉，这是一个新行。还有另一个。”）我想做什么现在告诉Python查找一个小写字母的组合，后跟大写字母/标点符号后跟大写字母并插入一个空格。

我觉得这很容易用reg。表达式，但它不是 - 我找不到“插入”函数或任何东西，字符串命令似乎也没有帮助。我该怎么做呢？任何帮助将不胜感激，我在这里绝望...

谢谢，帕特里克

Answer 1

尝试以下方法：

re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", your_string)

例如：

import re
lines = "I like bananasAnd this is a new line.And another one."
print re.sub(r"([a-z\.!?])([A-Z])", r"\1 \2", lines)
# I like bananas And this is a new line. And another one.

如果您想插入换行符而不是空格，请将替换内容更改为r"\1\n\2"。

Answer 2

使用re.sub你应该能够创建一个抓取小写和大写字母的模式，并将它们替换为相同的两个字母，但中间有一个空格：

import re
re.sub(r'([a-z][.?]?)([A-Z])', '\\1\n\\2', mystring)

Answer 3

您正在寻找sub功能。有关文档，请参阅http://docs.python.org/library/re.html。

Answer 4

嗯，有趣。您可以使用正则表达式将文本替换为sub() function：

>>> import re
>>> string = 'fooBar'
>>> re.sub(r'([a-z][.!?]*)([A-Z])', r'\1 \2', string)
'foo Bar'

Answer 5

如果你真的没有任何上限，除了在一个句子的开头，它可能最容易循环字符串。

>>> import string
>>> s = "a word endsA new sentence"
>>> lastend = 0
>>> sentences = list()
>>> for i in range(0, len(s)):
...    if s[i] in string.uppercase:
...        sentences.append(s[lastend:i])
...        lastend = i
>>> sentences.append(s[lastend:])
>>> print sentences
['a word ends', 'A new sentence']

Answer 6

这是另一种方法，它避免了正则表达式，并且不使用任何导入的库，只是内置函数...

s = "I like bananasAnd this is a new line.And another one."
with_whitespace = ''
last_was_upper = True
for c in s:
    if c.isupper():
        if not last_was_upper:
            with_whitespace += ' '
        last_was_upper = True
    else:
        last_was_upper = False
    with_whitespace += c

print with_whitespace

收率：

I like bananas And this is a new line. And another one.

在python中拆分合并的单词

6 个答案: