delete = ["man", "eat"]
item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']
我的代码:
lst = []
for x in item_list:
if not any(y in x for y in delete):
lst.append([x, x])
print(lst)
但是,此方法将使我的输出变得非常麻烦。例如,如果我的删除包含delete = [“ man”,“ eat”],它与item_list中的单词“ eater”不相似,但是仍然可以使用,因为我使用了if if any(y IN x)这个“输入”将返回true,因为eat包含在eater内,但我想要的不是包含在单词内而是匹配项。我想将“食者”与“食者”和“人与人”相匹配,而不是“食者”与“食人”。
有没有办法完全匹配而不是部分匹配?我当前的代码部分匹配,当删除中有很多部分单词时,这是错误的。
答案 0 :(得分:1)
然后您可以检查字符串的完全匹配:
delete = ["man", "eat"]
item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']
lst = []
for x in item_list:
if not any(y == x for y in delete):
lst.append([x, x])
print(lst)
# [['sharper_task|$none_venue|man', 'sharper_task|$none_venue|man'], ['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,']]
注意:or |
运算符在'sharper_task|eater_venue|todo'
之类的字符串中没有任何用途。
答案 1 :(得分:1)
您可以先使用|
将字符串拆分为子字符串,然后再使用in
运算符来测试delete
中的项目是否在其中一个子字符串中,并与使用{ {1}}:
_
这将输出:
lst = []
for x in item_list:
if not any(y in s.split('_') for s in x.split('|') for y in delete):
lst.append([x, x])
print(lst)
答案 2 :(得分:0)
假设您要分割竖线字符,
delete = ["man", "eat"]
item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']
lst = [item
for item in item_list
if any(word in item.split('|') for word in delete)]
答案 3 :(得分:0)
尝试以下-
import re
del_list = ["man", "eat"]
regex = '|'.join([r'\b' + y + r'\b' for y in del_list])
item_list = ['sharper_task|$none_venue|man', 'sharper_task|man_venue|king', 'sharper_task|king_venue|world', 'sharper_task|world_venue|dont', 'sharper_task|を_venue|eater', 'sharper_task|eater_venue|todo', 'sharper_task|todo_venue|,']
lst = []
for x in item_list:
if not re.search(regex, x):
lst.append([x, x])
print(lst)
此输出-
[['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,']]
使用单个正则表达式而不是列表可确保每个“待删除”项目的匹配都不会将item_list元素引入到输出列表中,而先前的“待删除”项目已将其删除。
regex ='|'.join()-在这里,它使用带有(\ b)的原始(r'')字符串创建正则表达式,以匹配单词边界(由非字母数字字符标识)。进一步了解here
如果我们使用2个循环,其中一个用于del_list,另一个用于item_list,则输出将如下所示,我认为这是不正确的,因为“ man”列表由于“ eat”不匹配而仍然出现一次。其余即使与del_list之一都不匹配的项目也会出现两次-
[['sharper_task|$none_venue|man', 'sharper_task|$none_venue|man'], ['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|man_venue|king', 'sharper_task|man_venue|king'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|king_venue|world', 'sharper_task|king_venue|world'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|world_venue|dont', 'sharper_task|world_venue|dont'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|を_venue|eater', 'sharper_task|を_venue|eater'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|eater_venue|todo', 'sharper_task|eater_venue|todo'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,'], ['sharper_task|todo_venue|,', 'sharper_task|todo_venue|,']]