Question

我有一个网址列表，我正在尝试使用特定关键字过滤它们，例如word1和word2，停止词列表说[stop1，stop2，stop3]。有没有办法过滤链接而不使用很多if条件？当我在每个停止词上使用if条件时，我得到了正确的输出，这看起来不是一个可行的选项。以下是暴力方法：

for link in url:
   if word1 or word2 in link:
      if stop1 not in link:
          if stop2 not in link:
              if stop3 not in link:
                  links.append(link)

Answer 1

如果我遇到你的情况，我会考虑几个选项。

您可以使用内置any和all功能的列表理解来过滤掉列表中不需要的网址：

urls = ['http://somewebsite.tld/word',
        'http://somewebsite.tld/word1',
        'http://somewebsite.tld/word1/stop3',
        'http://somewebsite.tld/word2',
        'http://somewebsite.tld/word2/stop2',
        'http://somewebsite.tld/word3',
        'http://somewebsite.tld/stop3/word1',
        'http://somewebsite.tld/stop4/word1']

includes = ['word1', 'word2']
excludes = ['stop1', 'stop2', 'stop3']

filtered_url_list = [url for url in urls if any(include in url for include in includes) if all(exclude not in url for exclude in excludes)]

或者你可以创建一个以一个url作为参数的函数，并为你想保留的url返回True，为你不保留的那些返回False，然后将该函数与未经过滤的内置filter函数的网址列表：

def urlfilter(url):
    includes = ['word1', 'word2']
    excludes = ['stop1', 'stop2', 'stop3']
    for include in includes:
        if include in url:
            for exclude in excludes:
                if exclude in url:
                    return False
            else:
                return True

urls = ['http://somewebsite.tld/word',
        'http://somewebsite.tld/word1',
        'http://somewebsite.tld/word1/stop3',
        'http://somewebsite.tld/word2',
        'http://somewebsite.tld/word2/stop2',
        'http://somewebsite.tld/word3',
        'http://somewebsite.tld/stop3/word1',
        'http://somewebsite.tld/stop4/word1']

filtered_url_list = filter(urlfilter, urls)

Answer 2

如果你能引用一个例子那么它会有所帮助。如果我们举一个像

这样的网址的例子

def urlSearch():
    word = []
    end_words = ['gmail', 'finance']
    Key_word = ['google']
    urlList= ['google.com//d/gmail', 'google.com/finance', 'google.com/sports', 'google.com/search']
    for i in urlList:
        main_part = i.split('/',i.count('/'))
        if main_part[len(main_part) - 1] in end_words:
            word = []
            for k in main_part[:-1]:
                for j in k.split('.'):
                    word.append(j)
            print (word)
        for p in Key_word:
            if p in word:
                print ("Url is: " + i)

urlSearch()

Answer 3

我会使用集合和列表理解：

must_in = set([word1, word2])
musnt_in = set([stop1, stop2, stop3])
links = [x for x in url if must_in & set(x) and not (musnt_in & set(x))]
print links

上面的代码可用于任意数量的单词和句点，不限于两个单词（word1，word2）和三个单词（stop1，stop2，stop3）。

Python字匹配

3 个答案: