通过结束字符

时间:2017-04-24 16:33:15

标签: python arrays string list

最近的一个项目让我需要将传入的短语(作为字符串)分成组成句子。例如,这个字符串:

"Your mother was a hamster, and your father smelt of elderberries! Now go away, or I shall taunt you a second time. You know what, never mind. This entire sentence is far too silly. Wouldn't you agree? I think it is."

需要转换为由以下元素组成的列表:

["Your mother was a hamster, and your father smelt of elderberries",
"Now go away, or I shall taunt you a second time",
"You know what, never mind",
"This entire sentence is far too silly",
"Wouldn't you agree",
"I think it is"]

出于这个功能的目的,一个"句子"是由!?.终止的字符串请注意,应从输出中删除标点符号,如上所示。

我有一个工作版本,但它很丑陋,留下了前导和尾随空格,我无法帮助,但认为有更好的方法:

from functools import reduce

def split_sentences(st):
  if type(st) is not str:
    raise TypeError("Cannot split non-strings")
  sl = st.split('.')
  sl = [s.split('?') for s in sl]
  sl = reduce(lambda x, y: x+y, sl) #Flatten the list
  sl = [s.split('!') for s in sl]
  return reduce(lambda x, y: x+y, sl)

4 个答案:

答案 0 :(得分:8)

使用re.split来指定匹配任何句子结尾字符(以及任何后续空格)的正则表达式。

def split_sentences(st):
    sentences = re.split(r'[.?!]\s*', st)
    if sentences[-1]:
        return sentences
    else:
        return sentences[:-1]

答案 1 :(得分:1)

您也可以在没有正则表达式的情况下执行此操作:

result = [s.strip() for s in String.replace('!', '.').replace('?', '.').split('.')]

或者,您可以编写一个不会复制数据的前沿算法:

String = list(String)

for i in range(len(String)):
    if (String[i] == '?') or (String[i] == '!'):
        String[i] = '.'

String = [s.strip() for s in String.split('.')]

答案 2 :(得分:1)

import re

st1 = "  Another example!! Let me contribute 0.50 cents here?? \
         How about pointer '.' character inside the sentence? \
         Uni Mechanical Pencil Kurutoga, Blue, 0.3mm (M310121P.33). \
         Maybe there could be a multipoint delimeter?.. Just maybe...  "

st2 = "One word"

def split_sentences(st):
    st = st.strip() + '. '
    sentences = re.split(r'[.?!][.?!\s]+', st)
    return sentences[:-1]

print(split_sentences(st1))
print(split_sentences(st2))

答案 3 :(得分:0)

您可以使用正则表达式split将它们拆分为特定的特殊字符。

import re
str = "Your mother was a hamster, and your father smelt of elderberries! Now go away, or I shall taunt you a second time. You know what, never mind. This entire sentence is far too silly. Wouldn't you agree? I think it is."
re.compile(r'[?.!]\s+').split(str)