如何从特定词典关键字的值列表中删除单词?

时间:2019-05-22 08:42:40

标签: python dictionary

我需要从词典列表中的特定键的值中删除单词列表。

以下是我的数据的示例:

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
         'instructions': ['some text...'],
         'category': 'dessert',
         'cuisine': 'thai', 
         'article': ['some text...']
        },
        {...}, {...}]

所需的输出:

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic', 'fresh cilantro']
        },
        {...}, {...}]

我尝试了不同的代码:

remove = '|'.join(words)
regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)

for dct in data:
    dct['ingredients']= list(filter(lambda x: regex.sub('', x), dct['ingredients']))

但这将返回以下错误:TypeError:sub()缺少1个必需的位置参数:'string'

我尝试过的其他代码:

for dct in data:
    dct['ingredients']= list(filter(lambda x: x != words, dct['ingredients']))
for dct in data:
    dct['ingredients']=[[el for el in string if el in words ] for string in dct['ingredients']]
for dct in data:
    for string in dct['ingredients']:
        dct['ingredients'] = list(filter(lambda x: x not in words, dct['ingredients']))

但是他们都没有解决我的问题。

4 个答案:

答案 0 :(得分:2)

为什么list不能与dict基本理解一样?

data = [{k:([' '.join([s for s in x.split() if s not in words]) for x in v] if k == 'ingredients' else v) for k, v in i.items()} for i in data]

答案 1 :(得分:0)

在您的re.sub方法中,您应该使用map,而不是filter(您不是要过滤掉单个单词,而是将整个字符串替换为re.sub的结果)

for dct in data:
    dct['ingredients']= list(map(lambda x: regex.sub('', x), dct['ingredients']))

或者,作为列表理解,可能更可读:

    dct['ingredients'] = [regex.sub("", x) for x in dct['ingredients']]

但是,两者都会留下一些多余的空间。如果单词总是用空格隔开,则可以只使用splitjoin(如果wordsset则更快):

for dct in data:
    dct['ingredients'] = [' '.join(w for w in string.split() if w not in words)
                          for string in dct['ingredients']]

答案 2 :(得分:0)

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']}
        ]
for i in data:
    word = ' @! '.join(i['ingredients'])
    for k in words:
        word = word.replace(k,'').strip()

    i['ingredients']=[i.strip() for i in word.split('@!')]

输出

[{'title': 'Simple Enchiladas Verdes',
  'prep_time': '15 min',
  'cook_time': '30 min',
  'ingredients': ['chicken breast',
   'tomato sauce',
   'garlic',
   'fresh  cilantro']}]

答案 3 :(得分:0)

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
        },
        {'title': 'Simple Enchiladas Verdes11',
         'prep_time': '15 min11',
         'cook_time': '30 min11',
         'ingredients': ['chicken breast1', '1tomato sauce', '1garlic cloves', '1fresh packed cilantro']}
        ]

n = []
for d in data:
    for item in d['ingredients']:
        for word in words:
            item = item.replace(word, '')
        n.append(item)
    d['ingredients'] = n

print (d)

输出:

{'title': 'Simple Enchiladas Verdes11', 'prep_time': '15 min11', 'cook_time': '30 min11', 'ingredients': ['chicken breast', 'tomato sauce', 'garlic ', 'fresh  cilantro', 'chicken breast1', '1tomato sauce', '1garlic ', '1fresh  cilantro']}