我在文本文件中有一个句子,我希望在python中显示,但是我希望在每个完整停止(句点)后开始显示新行。
例如我的段落是
"Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he
paid a lot for it. Did he mind? John Smith, Esq. thinks he didn't.
Nevertheless, this isn't true... Well, with a probability of .9 it
isn't."
但我希望它显示为以下
"Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he
paid a lot for it.
Did he mind? John Smith, Esq. thinks he didn't.
Nevertheless, this isn't true...
Well, with a probability of .9 it isn’t."
对于句子中出现的其他句点,例如在网站地址,“博士”,“Esq”中,这变得越来越困难。 '.9',当然还有省略号中的前两个点。
我不知道如何处理文本文件中存在的其他时期,任何人都可以帮忙吗?谢谢。
“你的任务是编写一个程序,给出一个文本文件的名称能够写出它 将每个句子的内容放在一个单独的行上。“< - 任务集
答案 0 :(得分:4)
这可以解决你的问题:
text = "Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he "\
"paid a lot for it. Did he mind? John Smith, Esq. thinks he didn't. "\
"Nevertheless, this isn't true... Well, with a probability of .9 it "\
"isn't."
import re
pat = ('(?<!Dr)(?<!Esq)\. +(?=[A-Z])')
print re.sub(pat,'.\n',text)
结果
Dr. Harrison bought bargain.co.uk for 2.5 million pounds, i.e. he paid a lot for it.
Did he mind? John Smith, Esq. thinks he didn't.
Nevertheless, this isn't true...
Well, with a probability of .9 it isn't.
但是,在人类写作这样复杂的事情中,不可能有一个永远不会失败的正则表达式。
请注意,例如,我不得不使用负面的后观断言来排除博士的情况(我为 Esq。做了同样的事情,尽管它并不代表您的文字中存在问题,因为后面的认为并非以大写字母开头)
我认为将所有相似的案例提前置于正则表达式模式是不可能的,总会有一些不成熟的案例会在某一天发生。
但是,这段代码完成了很多期望的工作。不是很糟糕,我很尊重。
答案 1 :(得分:1)
当且仅当点后跟空格和大写字母时,才可以添加换行符。它不会解决所有的情况,但结合使用像“博士”这样的例外词典,你可以做得很好,虽然不是很完美。
<强>更新强> 通过字典我的意思是Python字典和单词列表like this one 我没有找到任何包含最常见缩写的可下载文件,所以我担心你必须自己制作一个。