Question

我要提取“ AAAAAAAAAAAAAAAAAA”之后打印的所有文字

> str(nsw_psid_withtreated)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	2675 obs. of  12 variables:
 $ nsw    : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "=1 for NSW participants, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ age    : num  47 50 44 28 54 55 47 25 44 50 ...
  ..- attr(*, "label")= chr "age in years"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ educ   : num  12 12 12 12 12 12 12 12 12 12 ...
  ..- attr(*, "label")= chr "years of education"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ black  : num  0 1 0 1 0 0 0 1 0 1 ...
  ..- attr(*, "label")= chr "=1 if African-American, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ hisp   : num  0 0 0 0 0 1 0 0 0 0 ...
  ..- attr(*, "label")= chr "=1 if Hispanic, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ married: num  0 1 0 1 1 1 1 0 1 1 ...
  ..- attr(*, "label")= chr "=1 if married, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ re74   : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1974"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ re75   : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1975"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ re78   : num  0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1978"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ u74    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "=1 if unemployed in 1974, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ u75    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "=1 if unemployed in 1975, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"
 $ u78    : num  1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "label")= chr "=1 if unemployed in 1978, =0 otherwise"
  ..- attr(*, "format.stata")= chr "%9.0g"

以下内容无效：

Give me some text!
AAAAAAAAAAAAAAAAAA




        S
       p
      p
     p
Epppp

还可以在正则表达式中指定变量，而不是硬编码字符串“ AAAAAAAAAAAAAAAAAA”吗？

原因是，文本：“ AAAAAAAAAAAAAAAAAAAAA”是一个变量并发生变化。因此，我想在模式中查找特定的变量值，然后提取其后的所有文本。

Answer 1

使用re.S或re.DOTALL（它们是同义词）使findall跨行匹配。或者，就您而言，search可能更合适，因为您只想要一个匹配项。另外，要使其适用于非硬编码的字符串，只需使用字符串格式设置或字符串串联即可。为了避免在字符串中使用未转义的正则表达式字符，请通过re.escape运行它。

import re

result = """Give me some text!
AAAAAAAAAAAAAAAAAA




        S
       p
      p
     p
Epppp"""

s = 'AAAAAAAAAAAAAAAAAA'
# With formatting
m = re.search(r'{}(.*)'.format(re.escape(s)), result, re.S)
# With concatenation
m = re.search(re.escape(s) + r'(.*)', result, re.S)

print m.group(1)

Python正则表达式用于包含多行的模式

1 个答案: