我有一个网址,如果网址中包含“季节”这个词,我希望它不匹配。以下是两个例子:
CONTAINS SEASON, DO NOT MATCH
'http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7'
DOES NOT CONTAIN SEASON, MATCH
'http://imdb.com/title/tt0285331/
这是我到目前为止所做的,但我担心.+
会匹配所有内容直到结束。在这里使用正确的正则表达式是什么?
r'http://imdb.com/title/tt(\d)+/.+^[season].+'
答案 0 :(得分:2)
使用否定前瞻:
urls='''\
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/'''
import re
print re.findall(r'^(?!.*\bseason\b)(.*)', urls, re.M)
# ['http://imdb.com/title/tt0285331/']
答案 1 :(得分:2)
您不能在字符类中使用整个单词,您必须使用否定前瞻。
>>> s = '''
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/
http://imdb.com/title/tt1111111/episodes?this=2
http://imdb.com/title/tt0123456/episodes?this=1&season=1&ref_=tt_eps_sn_1'''
>>> import re
>>> re.findall(r'\bhttp://imdb.com/title/tt(?!\S+\bseason)\S+', s)
# ['http://imdb.com/title/tt0285331/', 'http://imdb.com/title/tt0285331/episodes?this=2']
答案 2 :(得分:2)
在tt\d+/
之后使用负lokahead,
>>> import re
>>> s = """http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
... http://imdb.com/title/tt0285331/
... """
>>> m = re.findall(r'^http://imdb.com/title/tt\d+/(?:(?!season).)*$', s, re.M)
>>> for i in m:
... print i
...
http://imdb.com/title/tt0285331/