我有这样的文字:
“标记
无论什么句子
无论什么句子2
马特
无论什么句子3
无论什么句子4
卡罗尔
无论什么句子5
无论句子6“
我希望能够识别每个句子(然后列出每个句子)。我怎么能这么简单地做到这一点?
Mark,Matt和Carol是否被识别并随后被添加到列表中并不重要,因为这些名称总是相同但句子可能不同。
我已经尝试了一下,但我无法弄清楚如何处理空行......
任何帮助都会非常感激,即使它只是一个指向我正确方向的指针。
答案 0 :(得分:2)
在Python中,你不需要正则表达式。
只需使用splitlines:
>>> text = """Mark
Whatever sentence
Whatever sentence 2
Matt
Whatever sentence 3
Whatever sentence 4
Carol
Whatever sentence 5
Whatever sentence 6"""
>>> sentences = text.splitlines()
>>> sentences
['Mark', '', 'Whatever sentence', 'Whatever sentence 2', '', 'Matt', '', 'Whatever sentence 3', 'Whatever sentence 4', '', 'Carol', '', 'Whatever sentence 5', 'Whatever sentence 6']
然后filter显示所有空行:
>>> sentences = list(filter(None, sentences))
>>> sentences
['Mark', 'Whatever sentence', 'Whatever sentence 2', 'Matt', 'Whatever sentence 3', 'Whatever sentence 4', 'Carol', 'Whatever sentence 5', 'Whatever sentence 6']
如果通过"列出每个句子",你的意思是将每个句子分成单词,你可以这样做:
>>> sentences = [sentence.split() for sentence in sentences]
>>> sentences
[['Mark'], ['Whatever', 'sentence'], ['Whatever', 'sentence', '2'], ['Matt'], ['Whatever', 'sentence', '3'], ['Whatever', 'sentence', '4'], ['Carol'], ['Whatever', 'sentence', '5'], ['Whatever', 'sentence', '6']]
答案 1 :(得分:1)