这是我的全文:
RETENTION
Liability in excess of the Retention
The Retention shall be borne by the Named Insured and the Insurer shall only be liable for Loss once the Retention has been fully eroded. The Retention shall apply until such time as it has been fully eroded after which no Retention shall apply.
Erosion of the Retention
The Retention shall be eroded by Loss for which the Insurer would be liable under this Policy but for the Retention.
我想提取整个RETENTION段落。
这是我提取具有特定单词的句子的代码(此处为:保留)。
abc3=([sentence + '.' for sentence in txt_trim_string.split('.') if 'RETENTION' in sentence])
但这给出了输出:
RETENTION
Liability in excess of the Retention
The Retention shall be borne by the Named Insured and the Insurer shall only be liable for Loss once the Retention has been fully eroded.
我还想包括:
Erosion of the Retention
The Retention shall be eroded by Loss for which the Insurer would be liable under this Policy but for the Retention.
我该怎么做?
答案 0 :(得分:0)
你可以尝试做一切但是一个完整的大写词。要获得完全大写的单词,您可以使用以下正则表达式:([A-Z]){2,}
该表达式捕获彼此相邻的2个或更多大写字母的单词。
另一种方法是使用以下正则表达式:[A-Z]?([^A-Z])
这会选择0或1个大写字母,后跟任何不是两个大写字母的东西。
import re
regex = r'[A-Z]?([^A-Z])'
for result in regex.findall(<your text as a string>):
print(result[1:]) # there will be an extraneous character when a fully capitalized word is encountered
答案 1 :(得分:0)
尝试正则表达式:[A-Z]{2,}.*?(?=(?:[A-Z]{2,}|\Z))
并re.DOTALL
选项将换行符与.
匹配