我试图编写正则表达式来查找HTML中的特定数据。 例如,我有
'Channel Boleyn emotion'
但我不想包含'emotion'
,如何排除import urllib2
from re import findall
from urllib import urlopen
response = urllib2.urlopen("http://www.worldfootball.net/schedule/eng-premier-league-2015-2016-spieltag/37/")
html_bytes = response.read()
html = html_bytes.decode('utf-8')
ranking= findall('[e]="(\w* ?\w* ?\w*)', html)
print ranking()
字符串?
以下是网址http://www.worldfootball.net/schedule/eng-premier-league-2015-2016-spieltag/37/
{{1}}
[e] ="(\ w *?\ w *?\ w *),
代码还没有工作,(我是新手),但我只是想摆脱Channel Boleyn情感'所以我可以走得更远感谢
答案 0 :(得分:1)
你需要使用负前瞻断言。
^\w+(?: \w+)*(?<!\bemotion)$
(?!\bemotion)$
声称最后不存在emotion
这个词。
或
^\w+(?: \w+)*(?<!\semotion)$
或
>>> s = [
'Leicester City',
'Tottenham Hotspur',
'Arsenal FC',
'Manchester City',
'Manchester United',
'Southampton FC',
'West Ham United',
'Liverpool FC',
'Chelsea FC',
'Stoke City',
'Swansea City',
'Everton FC',
'Watford FC',
'Crystal Palace',
'West Bromwich Albion',
'AFC Bournemouth',
'Sunderland AFC',
'Newcastle United',
'Norwich City',
'Aston Villa',
'Channel Boleyn emotion',
'Channel Boleyn emotion']
>>> [i for i in s if i.split()[-1] != 'emotion']
['Leicester City', 'Tottenham Hotspur', 'Arsenal FC', 'Manchester City', 'Manchester United', 'Southampton FC', 'West Ham United', 'Liverpool FC', 'Chelsea FC', 'Stoke City', 'Swansea City', 'Everton FC', 'Watford FC', 'Crystal Palace', 'West Bromwich Albion', 'AFC Bournemouth', 'Sunderland AFC', 'Newcastle United', 'Norwich City', 'Aston Villa']
答案 1 :(得分:0)
您可以使用模式title="([^"]*)">\1</a>
。这将查找具有相同标题和文本的链接。
>>> print findall(r'title="([^"]*)">\1</a>', html)
[u'Norwich City', u'Manchester United', u'AFC Bournemouth', u'West Bromwich Albion', u'Aston Villa', u'Newcastle United', u'Crystal Palace', u'Stoke City', u'Sunderland AFC', u'Chelsea FC', u'West Ham United', u'Swansea City', u'Leicester City', u'Everton FC', u'Tottenham Hotspur', u'Southampton FC', u'Liverpool FC', u'Watford FC', u'Manchester City', u'Arsenal FC', u'Leicester City', u'Leicester City', u'Tottenham Hotspur', u'Tottenham Hotspur', u'Arsenal FC', u'Arsenal FC', u'Manchester City', u'Manchester City', u'Manchester United', u'Manchester United', u'Southampton FC', u'Southampton FC', u'West Ham United', u'West Ham United', u'Liverpool FC', u'Liverpool FC', u'Chelsea FC', u'Chelsea FC', u'Stoke City', u'Stoke City', u'Swansea City', u'Swansea City', u'Everton FC', u'Everton FC', u'Watford FC', u'Watford FC', u'Crystal Palace', u'Crystal Palace', u'West Bromwich Albion', u'West Bromwich Albion', u'AFC Bournemouth', u'AFC Bournemouth', u'Sunderland AFC', u'Sunderland AFC', u'Newcastle United', u'Newcastle United', u'Norwich City', u'Norwich City', u'Aston Villa', u'Aston Villa']