如果有特定关键字,则打印URL

时间:2019-06-18 19:15:23

标签: python list for-loop

我具有从espn提取网址的功能。网址看起来像这样http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010

我已经创建了一个国家/地区列表,并且如果该URL包含列表中的国家/地区,我想打印一条消息,否则传递到提取下一个URL

all_countries=['England','India','West Indies']

#one_day will have all the links
for day in one_day:
        d=day.split('-')
        if d in all_countries:
            print(day)
        else:
            next

它不起作用。感谢您的帮助

4 个答案:

答案 0 :(得分:1)

这是因为.split()返回了一个列表。您将必须迭代列表中的项目。本质上,您要问计算机是否为

["http://www.espncricinfo.com/series/13224/scorecard/426406/scotland", "vs", "england", "only", "odi", "england", "in", "scotland", "odi", "match", "2010"]

在某个看起来像这样的列表中(我假设):

["england", "scotland", "ireland", ...]

我建议您使用一些打印语句。一个简单的print(d)将显示此行为。您必须遍历d

for word in d:
    if word in all_countries:
        print(word)
        break # otherwise multiple words will trigger your logic multiple times

答案 1 :(得分:1)

这是一种简单的方法(假设one_day是网址列表,而all_countries是国家/地区名称列表):

# (some example values for urls and country names) 
one_day = ['http://www.espncricinfo.com/...-vs-australia-only-odi-au...', 
           'http://www.espncricinfo.com/...scotland-vs-england-only-...'] 
all_countries = ['India', 'Ireland', 'Australia'] 

for day in one_day:
  for country in all_countries:
    if country.lower() in day:
      print(f'found a match for {country}: `{day}`')
      # or just: print(day) 

之所以有效,是因为in检查子字符串,例如:

'Australia'.lower() in '...-vs-australia-only-odi-au...'
## True 

这就是您在条件country.lower() in day下内部循环的每次迭代中要检查的内容。

p.s。您还可以像原始帖子一样在'-'上进行拆分,以防万一您担心诸如'USA'与包含'-musac...'或类似名称的网址匹配。为此,您可以这样说:

for day in one_day:
  day_split = day.split('-')
  for elem in day_split:
    if elem in [c.lower() for c in all_countries]:
      print(f'found a match: `{day}`')  

答案 2 :(得分:1)

或使用正则表达式则更灵活;)

import re

urls = ["http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010",
        "http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010",
        "http://www.espncricinfo.com/series/13240/scorecard/426384/titi-2010"
       ]

countries = ['England',
             'India',
             'West Indies']

for url in urls:
    if bool(re.match('(?i).*?(' + '|'.join(countries).replace(' ', '\W') + ').*?', url)):
        print(url)

结果:

http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010
http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010

答案 3 :(得分:0)

以您的情况

all_countries=['England','India','West Indies']
    for day in one_day:
            d=day.split('-')
            if d in all_countries:
                print(day)
            else:
                next

您正在做d = day.split('-') d也是一个列表,因此您需要迭代d,然后检查国家/地区中的值 还有一点是,您all_countries保留Captlize国家/地区名称,因此您需要先将其转换为小写字母,然后再检查条件

下面的代码段可能会有所帮助

all_countries=['England','India','West Indies']

for day in one_day:
    d=day.split('-')
    for val in d:
        if val.lower() in [x.lower() for x in all_countries ]:
            print(day)
        else:
            next