我要提取“ AAAAAAAAAAAAAAAAAA”之后打印的所有文字
> str(nsw_psid_withtreated)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2675 obs. of 12 variables:
$ nsw : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "=1 for NSW participants, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
$ age : num 47 50 44 28 54 55 47 25 44 50 ...
..- attr(*, "label")= chr "age in years"
..- attr(*, "format.stata")= chr "%9.0g"
$ educ : num 12 12 12 12 12 12 12 12 12 12 ...
..- attr(*, "label")= chr "years of education"
..- attr(*, "format.stata")= chr "%9.0g"
$ black : num 0 1 0 1 0 0 0 1 0 1 ...
..- attr(*, "label")= chr "=1 if African-American, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
$ hisp : num 0 0 0 0 0 1 0 0 0 0 ...
..- attr(*, "label")= chr "=1 if Hispanic, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
$ married: num 0 1 0 1 1 1 1 0 1 1 ...
..- attr(*, "label")= chr "=1 if married, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
$ re74 : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1974"
..- attr(*, "format.stata")= chr "%9.0g"
$ re75 : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1975"
..- attr(*, "format.stata")= chr "%9.0g"
$ re78 : num 0 0 0 0 0 0 0 0 0 0 ...
..- attr(*, "label")= chr "real (inflation adjusted) earnings for 1978"
..- attr(*, "format.stata")= chr "%9.0g"
$ u74 : num 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "=1 if unemployed in 1974, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
$ u75 : num 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "=1 if unemployed in 1975, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
$ u78 : num 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "label")= chr "=1 if unemployed in 1978, =0 otherwise"
..- attr(*, "format.stata")= chr "%9.0g"
以下内容无效:
Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp
还可以在正则表达式中指定变量,而不是硬编码字符串“ AAAAAAAAAAAAAAAAAA”吗?
原因是,文本:“ AAAAAAAAAAAAAAAAAAAAA”是一个变量并发生变化。因此,我想在模式中查找特定的变量值,然后提取其后的所有文本。
答案 0 :(得分:3)
使用re.S
或re.DOTALL
(它们是同义词)使findall
跨行匹配。或者,就您而言,search
可能更合适,因为您只想要一个匹配项。另外,要使其适用于非硬编码的字符串,只需使用字符串格式设置或字符串串联即可。为了避免在字符串中使用未转义的正则表达式字符,请通过re.escape
运行它。
import re
result = """Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp"""
s = 'AAAAAAAAAAAAAAAAAA'
# With formatting
m = re.search(r'{}(.*)'.format(re.escape(s)), result, re.S)
# With concatenation
m = re.search(re.escape(s) + r'(.*)', result, re.S)
print m.group(1)