Question

我一直试图用下面的方法清理一些数据，但我的正则表达式不会超过\ n。我不明白为什么因为我认为。*应该抓住一切。

table = POSITIONS AND APPOINTMENTS  2006  present Fellow, University of Colorado at Denver Health Sciences Center, Native Elder Research Center, American Indian and Alaska Native Program, Denver, CO  \n2002  present Assistant Professor, Department of Development Sociology, Cornell \n   University, Ithaca, NY   \n \n1999  2001

output = table.encode（'ascii'，errors ='ignore'）。strip（）

pat = r'POSITIONS.*'.format(endword)
print pat
regex = re.compile(pat)
if regex.search(output):
    print regex.findall(output)
    pieces.append(regex.findall(output))

以上回报：

['POSITIONS AND APPOINTMENTS  2006  present Fellow, University of Colorado at Denver Health Sciences Center, Native Elder Research Center, American Indian and Alaska Native Program, Denver, CO  ']

Answer 1

除非您指定re.DOTALL (or re.S) flag，否则

.与换行符不匹配。

>>> import re
>>> re.search('.', '\n')
>>> re.search('.', '\n', flags=re.DOTALL)
<_sre.SRE_Match object at 0x0000000002AB8100>

regex = re.compile(pat, flags=re.DOTALL)

正则表达式不会捕获过去\ n

1 个答案: