仅当不存在某些异常时,才如何拆分字符串?

时间:2019-05-01 20:07:14

标签: python-3.x

我设法创建了一个程序,该程序可以基于包含“。”,“!”和“?”的换行符来分割字符串。我已成功完成此操作,但是我创建了一个列表,但有一个例外,即如果字符串中存在这些列表,则该列表不应包含换行符,我只是不知道如何读取这些列表并将它们与目标字符串进行比较以确保存在该列表没有换行符。

我是编程和python 3的新手,并尝试创建一个名为exception_finder的附加函数,该函数如果返回True则不存在换行符,但是,此操作未成功。

    sentence= "Hello. My name is George... Michael! Samuel Williams. alittlemouse"
    exception_1_3_char = [". a", ". b", ". c", ". d", ". e", ". f", ". g", ". h", ". i", ". j", ". k", ". l", ". m", ". n", ". o", ". p", ". q", ". r", ". s", ". t", ". u", ". v", ". w", ". x", ". y", ". z"]
    def sentence_splitter(target_sentence):
        target_sentence = list(target_sentence)
        for character in range(len(target_sentence)):
            if target_sentence[character:character+2] == list(". ") or target_sentence[character:character+2] == list("! ") or target_sentence[character:character+2] == list('? ') and exception_finder(target_sentence) == True:
                target_sentence[character:character+2] += list("\n")
                print(''.join(target_sentence))

    sentence_splitter(sentence)

    def exception_finder(target_sentence):
        target_sentence = list(target_sentence)
        for exception in range(len(exception_1_3_char)):
            if exception in target_sentence:
                return True

当前结果:

    Hello. 
    My name is George... Michael! Samuel Williams. alittlemouse
    Hello. 
    My name is George... 
    Michael! Samuel Williams. alittlemouse
    Hello. 
    My name is George... 
    Michael! 
    Samuel Williams. alittlemouse
    Hello. 
    My name is George... 
    Michael! 
    Samuel Williams. 
    alittlemouse

所需结果:

    Hello
    My name is George...
    Michael!
    Samuel Williams. alittlemouse

1 个答案:

答案 0 :(得分:2)

根据您的要求,应使用Regex并提前进行:

import re

sentence= "Hello. My name is George... Michael! Samuel Williams. alittlemouse"
res = re.split('[\.!\?]+(?!\s[a-z])', sentence)
##=> ['Hello', ' My name is George', ' Michael', ' Samuel Williams. alittlemouse']

(?:\s[a-z]\.)是一个否定的超前行为,表示任何表达式(即[\.!\?]+)后面都不是\s[a-z]?!表示没有捕获组,因此不会被捕获。

如果只想用单个'。'分隔,则可以略微修改正则表达式:

res = re.split('[\.!\?](?!(?:\s[a-z]|\.))', sentence)
##=> ['Hello', ' My name is George..', ' Michael', ' Samuel Williams. alittlemouse']