我想删除与给定列表中的某个关键字不匹配的网址。这意味着,我想删除所有包含' sale'或者' new"就我而言。
测试数据
url_list = ['https://www.test.com/men-fashion/', 'https://www.test.com/men-shirts', 'https://www.test.com/sale-fashion/', 'https://www.test.com/new-fashion/']
我的子字符串如下:
to_remove = ['sale','new']
我试图通过使用any()组合使用列表推导来尝试这样做,但这会过滤掉与我的" to_remove" -list匹配的所有网址。但我期待的是相反的结果。
url_list[:] = [url for url in url_list if any(substring in url for substring in to_remove)]
print(url_list)
答案 0 :(得分:0)
使用正则表达式的一种方法:
import re
url_list = ['https://www.test.com/men-fashion/', 'https://www.test.com/men-shirts', 'https://www.test.com/sale-fashion/', 'https://www.test.com/new-fashion/']
to_remove = ['sale','new']
result = [i for i in url_list if not re.search("|".join(to_remove), i)]
print(result)
<强>输出:强>
['https://www.test.com/men-fashion/', 'https://www.test.com/men-shirts']