如果单词出现在正则表达式中,则不匹配

时间:2014-08-22 22:06:03

标签: python regex

我有一个网址,如果网址中包含“季节”这个词,我希望它不匹配。以下是两个例子:

CONTAINS SEASON, DO NOT MATCH
'http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7'

DOES NOT CONTAIN SEASON, MATCH
'http://imdb.com/title/tt0285331/

这是我到目前为止所做的,但我担心.+会匹配所有内容直到结束。在这里使用正确的正则表达式是什么?

r'http://imdb.com/title/tt(\d)+/.+^[season].+'

3 个答案:

答案 0 :(得分:2)

使用否定前瞻:

urls='''\
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/'''

import re

print re.findall(r'^(?!.*\bseason\b)(.*)', urls, re.M)
# ['http://imdb.com/title/tt0285331/']

答案 1 :(得分:2)

您不能在字符类中使用整个单词,您必须使用否定前瞻

>>> s = '''
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/
http://imdb.com/title/tt1111111/episodes?this=2
http://imdb.com/title/tt0123456/episodes?this=1&season=1&ref_=tt_eps_sn_1'''
>>> import re
>>> re.findall(r'\bhttp://imdb.com/title/tt(?!\S+\bseason)\S+', s)
# ['http://imdb.com/title/tt0285331/', 'http://imdb.com/title/tt0285331/episodes?this=2']

答案 2 :(得分:2)

tt\d+/之后使用负lokahead,

>>> import re
>>> s = """http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
... http://imdb.com/title/tt0285331/
... """
>>> m = re.findall(r'^http://imdb.com/title/tt\d+/(?:(?!season).)*$', s, re.M)
>>> for i in m:
...     print i
... 
http://imdb.com/title/tt0285331/