我一直试图用下面的方法清理一些数据,但我的正则表达式不会超过\ n。我不明白为什么因为我认为。*应该抓住一切。
table = POSITIONS AND APPOINTMENTS 2006 present Fellow, University of Colorado at Denver Health Sciences Center, Native Elder Research Center, American Indian and Alaska Native Program, Denver, CO \n2002 present Assistant Professor, Department of Development Sociology, Cornell \n University, Ithaca, NY \n \n1999 2001
output = table.encode('ascii',errors ='ignore')。strip()
pat = r'POSITIONS.*'.format(endword)
print pat
regex = re.compile(pat)
if regex.search(output):
print regex.findall(output)
pieces.append(regex.findall(output))
以上回报:
['POSITIONS AND APPOINTMENTS 2006 present Fellow, University of Colorado at Denver Health Sciences Center, Native Elder Research Center, American Indian and Alaska Native Program, Denver, CO ']
答案 0 :(得分:2)
re.DOTALL
(or re.S
) flag,否则 .
与换行符不匹配。
>>> import re
>>> re.search('.', '\n')
>>> re.search('.', '\n', flags=re.DOTALL)
<_sre.SRE_Match object at 0x0000000002AB8100>
regex = re.compile(pat, flags=re.DOTALL)