在一段文字中间匹配句子,直到我点击“Hello World&#39 ;?

时间:2018-05-26 20:52:25

标签: python regex

所以,让我说我有这个文本块,但我希望在HELLO WORLD之前匹配文本。什么正则表达式是合适的?

我用过这个:Te pri\.[?=HELLO WORLD]但没有。

Lorem ipsum dolor sit amet, timeam evertitur ex eos, utamur temporibus disputationi eum te. 
Te pri dicant exerci nonumy, in case erat albucius mei.  
Pertinax periculis concludaturque eum te, et nam vero nominavi deterruisset. HELLO WORLD. 
Ex augue scriptorem pri. Vocent minimum quaerendum duo eu, habemus adipiscing ex eum.

请记住,我对Regex表达方式还是比较新的。

3 个答案:

答案 0 :(得分:1)

使用以下内容: -

import re

text = '''Lorem ipsum dolor sit amet, timeam evertitur ex eos, utamur temporibus disputationi eum te. 
Te pri dicant exerci nonumy, in case erat albucius mei.  
Pertinax periculis concludaturque eum te, et nam vero nominavi deterruisset. HELLO WORLD. 
Ex augue scriptorem pri. Vocent minimum quaerendum duo eu, habemus adipiscing ex eu'''


try:
    foundSubString = re.search('(?s)(Te\spri\sdicant.*?)HELLO WORLD', text).group(1)
except AttributeError:
    foundSubString = '' # apply your error handling

print 'Match Found:',foundSubString

答案 1 :(得分:1)

您要找的是所有字符.出现一次或多次+

并且您希望确保之后发生其他模式而不将其包含在匹配中,也称为“积极向前”(?=)

.+(?=HELLO WORLD)

Demo 1

如果您想匹配换行符,可以使用.标志/修饰符简单地扩展s的含义。

Demo 2

答案 2 :(得分:1)

你想要这个正则表达式:

(?s)(Te pri.*?)HELLO WORLD

分解,表达的部分意味着:

(?s)   -- Make the '.' regex metacharacter match newlines too
(      -- Start a capturing group
Te pri -- Match exactly 'Te pri'
.      -- The dot metacharacter matches any character except newlines
*      -- Match the prior metacharacter, character class or group zero or more times
       -- By default will match as many times as possible
?      -- When paired with '*', it makes '*' match as few times as possible
       -- This way, '.*' doesn't match 'HELLO WORLD'
)      -- End the capturing group

使用.group()访问组中捕获的内容,例如

import re
regex = re.compile(r"(?s)(Te pri.*?)HELLO WORLD")
m = regex.match(your_text)
m.group(1)

快乐的编码!