我具有从espn提取网址的功能。网址看起来像这样http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010,http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010
我已经创建了一个国家/地区列表,并且如果该URL包含列表中的国家/地区,我想打印一条消息,否则传递到提取下一个URL
all_countries=['England','India','West Indies']
#one_day will have all the links
for day in one_day:
d=day.split('-')
if d in all_countries:
print(day)
else:
next
它不起作用。感谢您的帮助
答案 0 :(得分:1)
这是因为.split()
返回了一个列表。您将必须迭代列表中的项目。本质上,您要问计算机是否为
["http://www.espncricinfo.com/series/13224/scorecard/426406/scotland", "vs", "england", "only", "odi", "england", "in", "scotland", "odi", "match", "2010"]
在某个看起来像这样的列表中(我假设):
["england", "scotland", "ireland", ...]
我建议您使用一些打印语句。一个简单的print(d)
将显示此行为。您必须遍历d
:
for word in d:
if word in all_countries:
print(word)
break # otherwise multiple words will trigger your logic multiple times
答案 1 :(得分:1)
这是一种简单的方法(假设one_day
是网址列表,而all_countries
是国家/地区名称列表):
# (some example values for urls and country names)
one_day = ['http://www.espncricinfo.com/...-vs-australia-only-odi-au...',
'http://www.espncricinfo.com/...scotland-vs-england-only-...']
all_countries = ['India', 'Ireland', 'Australia']
for day in one_day:
for country in all_countries:
if country.lower() in day:
print(f'found a match for {country}: `{day}`')
# or just: print(day)
之所以有效,是因为in
检查子字符串,例如:
'Australia'.lower() in '...-vs-australia-only-odi-au...'
## True
这就是您在条件country.lower() in day
下内部循环的每次迭代中要检查的内容。
p.s。您还可以像原始帖子一样在'-'
上进行拆分,以防万一您担心诸如'USA'
与包含'-musac...'
或类似名称的网址匹配。为此,您可以这样说:
for day in one_day:
day_split = day.split('-')
for elem in day_split:
if elem in [c.lower() for c in all_countries]:
print(f'found a match: `{day}`')
答案 2 :(得分:1)
或使用正则表达式则更灵活;)
import re
urls = ["http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010",
"http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010",
"http://www.espncricinfo.com/series/13240/scorecard/426384/titi-2010"
]
countries = ['England',
'India',
'West Indies']
for url in urls:
if bool(re.match('(?i).*?(' + '|'.join(countries).replace(' ', '\W') + ').*?', url)):
print(url)
结果:
http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010
http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010
答案 3 :(得分:0)
以您的情况
all_countries=['England','India','West Indies']
for day in one_day:
d=day.split('-')
if d in all_countries:
print(day)
else:
next
您正在做d = day.split('-')
d也是一个列表,因此您需要迭代d,然后检查国家/地区中的值
还有一点是,您all_countries保留Captlize国家/地区名称,因此您需要先将其转换为小写字母,然后再检查条件
下面的代码段可能会有所帮助
all_countries=['England','India','West Indies']
for day in one_day:
d=day.split('-')
for val in d:
if val.lower() in [x.lower() for x in all_countries ]:
print(day)
else:
next