正则表达式可在观察到连续4个大写字母后捕获所有文本

时间:2019-01-30 10:18:59

标签: python regex

编辑:正则表达式应的找LONDON(它可能是PARIS,比利时等)..它应该是柔性的,使得它匹配的任何制品,当它观察的 4个连续的资本字母

以下文本:

    text text text, more text

   -- Some More Texty Text Text
      better manage their online privacy needs

   -- Another line of Text
      in foster programs

LONDON, UK. January 28, 2019--

More example of text, lots of text, Text text. Imagine this is a long article... blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah.

我想一个正则表达式,可以基本上从提取的 LONDON 线和其上的所有文本(正则表达式的逻辑应确定这条线时,它遵循为或多个大写字母)。因此输出应为:

LONDON, UK. January 28, 2019--

More example of text, lots of text, Text text. Imagine this is a long article... blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah.

2 个答案:

答案 0 :(得分:0)

(?:LONDON).*

以上内容将抢占LONDON一词,之后将包含所有内容。

import re
pattern = r'(?:LONDON).*'
function_string = "text text text, more text -- Some More Texty Text Text better manage their online privacy needs  -- Another line of Text in foster programs  LONDON, UK. January 28, 2019-- More example of text, lots of text, Text text. Imagine this is a long article... blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah."

print(re.findall(pattern, function_string))

输出:

['LONDON, UK. January 28, 2019-- More example of text, lots of text, Text text. Imagine this is a long article... blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah.']

编辑:

过度阅读的好处,您需要的是:

pattern = r'(?s)[A-Z]{4}.*'

正如@Pushpesh Kumar Rajwanshi在评论中所建议的那样。

答案 1 :(得分:0)

有关更一般的方法,请尝试:

 import re

 four_caps = re.compile(r'[A-Z]{4}.*')
 string = "text text text, more text -- Some More Texty Text Text better manage their online privacy needs  -- Another line of Text in foster programs  LONDON, UK. January 28, 2019-- More example of text, lots of text, Text text. Imagine this is a long article... blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah."

 output = re.findall(four_caps, string)