Python正则表达式用于包含多行的模式

时间:2019-03-17 02:31:37

标签: python regex

我要提取“ AAAAAAAAAAAAAAAAAA”之后打印的所有文字

> str(nsw_psid_withtreated)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	2675 obs. of  12 variables:
 $ nsw    : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "=1 for NSW participants, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ age    : num  47 50 44 28 54 55 47 25 44 50 ...
  ..- attr(*, "label")= chr "age in years"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ educ   : num  12 12 12 12 12 12 12 12 12 12 ...
  ..- attr(*, "label")= chr "years of education"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ black  : num  0 1 0 1 0 0 0 1 0 1 ...
  ..- attr(*, "label")= chr "=1 if African-American, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ hisp   : num  0 0 0 0 0 1 0 0 0 0 ...
  ..- attr(*, "label")= chr "=1 if Hispanic, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ married: num  0 1 0 1 1 1 1 0 1 1 ...
  ..- attr(*, "label")= chr "=1 if married, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ re74   : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1974"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ re75   : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1975"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ re78   : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1978"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ u74    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "=1 if unemployed in 1974, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ u75    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "=1 if unemployed in 1975, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ u78    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "=1 if unemployed in 1978, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"

以下内容无效:

Give me some text!
AAAAAAAAAAAAAAAAAA




        S
       p
      p
     p
Epppp

还可以在正则表达式中指定变量,而不是硬编码字符串“ AAAAAAAAAAAAAAAAAA”吗?

原因是,文本:“ AAAAAAAAAAAAAAAAAAAAA”是一个变量并发生变化。因此,我想在模式中查找特定的变量值,然后提取其后的所有文本。

1 个答案:

答案 0 :(得分:3)

使用re.Sre.DOTALL(它们是同义词)使findall跨行匹配。或者,就您而言,search可能更合适,因为您只想要一个匹配项。另外,要使其适用于非硬编码的字符串,只需使用字符串格式设置或字符串串联即可。为了避免在字符串中使用未转义的正则表达式字符,请通过re.escape运行它。

import re

result = """Give me some text!
AAAAAAAAAAAAAAAAAA




        S
       p
      p
     p
Epppp"""

s = 'AAAAAAAAAAAAAAAAAA'
# With formatting
m = re.search(r'{}(.*)'.format(re.escape(s)), result, re.S)
# With concatenation
m = re.search(re.escape(s) + r'(.*)', result, re.S)

print m.group(1)