我有一个URL列表,例如:
www.google.com
www.yahoo.fr
www.stackoverflow.com
我要删除所有包含字符串“ oo
”和“ flow
”的URL。
我做了一个python函数:
def my_function(param1,param2,
param3,param4,liste_to_delete,liste2_to_delete):
status=True
SQL_CONSTANT = "url not like '%"
URL_SEP = ";"
# getFirstList
broadcastListe1String =""
listtodelete = liste2_to_delete.split(URL_SEP)
for url in listtodelete:
broadcastListe1String = SQL_CONSTANT + url + "%'"
if(listtodelete.index(url) != len(listtodelete) -1):
broadcastListe1String = broadcastListe1String + " AND "
my_broadcast = sc.broadcast(broadcastListe1String)
然后我做了:
DataFrame= my_DataFrame.where(my_broadcast.value)
此功能从列表中的第二个元素开始,不需要 强调列表中的第一个元素。
如何更改我的功能,是否还要删除列表中的第一个元素? 我希望我很清楚 谢谢
答案 0 :(得分:1)
我认为您可以像这样使用filter
函数:
filter(lambda x: 'oo' not in x and 'flow' not in x, lst)
例如:
lst = ['www.google.com',
'www.yahoo.fr',
'www.stackoverflow.com',
'www.duckduck.com',
'www.amazon.com',
]
filtered_lst = filter(lambda x: 'oo' not in x and 'flow' not in x, lst)
# filtered_lst = ['www.duckduck.com', 'www.amazon.com']
或:
lst = ['www.google.com',
'www.yahoo.fr',
'www.stackoverflow.com',
'www.duckduck.com',
'www.amazon.com',
]
ex_words = ['oo', 'flow']
filterd_lst = filter(lambda x: all(w not in x for w in ex_words), lst)
# filtered_lst = ['www.duckduck.com', 'www.amazon.com']
答案 1 :(得分:0)
filter = ['oo', 'flow']
list = ['www.google.com','www.yahoo.fr','www.stackoverflow.com','www.something.com']
for val in list:
if not any(bad_word in val for bad_word in filter):
print(val)
输出
www.something.com