Python重新匹配空间和新行

时间:2012-03-02 15:55:40

标签: python regex

html6="""
<p<ins style="background:#e6ffe6;">re><code</ins>>
int aint bint c<ins style="background:#e6ffe6;"></code></ins></p<ins style="background:#e6ffe6;">re</ins>><p>int d</p>
"""

Html6和Html7是一样的,只是Html7有“\ n”

html7="""
<p<ins style="background:#e6ffe6;">re><code</ins>>int a
int b
int c<ins style="background:#e6ffe6;">
</code></ins></p<ins style="background:#e6ffe6;">re</ins>>
<p>int d</p>
"""

p_to_pre_code_pattern = re.compile(
"""<p
<(?P<action_tag>(del|ins)) (?P<action_attr>.*)>re><code</(?P=action_tag)>
>
(?P<text>.*?)
<(?P=action_tag) (?P=action_attr)>
</code></(?P=action_tag)>
</p
<(?P=action_tag) (?P=action_attr)>re</(?P=action_tag)>
>""",re.VERBOSE)


print re.match(p_to_pre_code_pattern,html6)    
print re.match(p_to_pre_code_pattern,html7)

html6和html7都不匹配? ,但如果我将“\ n”替换为“”,那么两者都会很多。

print re.match(p_to_pre_code_pattern,html6.replace("\n",""))    
print re.match(p_to_pre_code_pattern,html7.replace("\n",""))

我想知道如何在不调用p_to_pre_code_pattern的情况下更改我将匹配html6和html7的replace("\n",""))

1 个答案:

答案 0 :(得分:1)

致电 re.DOTALL 时,您可能会错过 re.compile(..., re.VERBOSE|re.DOTALL) 标志

re.S 
re.DOTALL 

Make the '.' special character match any character at all, including a newline;
without this flag, '.' will match anything except a newline.