Question

我需要从词典列表中的特定键的值中删除单词列表。

以下是我的数据的示例：

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
         'instructions': ['some text...'],
         'category': 'dessert',
         'cuisine': 'thai', 
         'article': ['some text...']
        },
        {...}, {...}]

所需的输出：

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic', 'fresh cilantro']
        },
        {...}, {...}]

我尝试了不同的代码：

remove = '|'.join(words)
regex = re.compile(r'\b('+remove+r')\b', flags=re.IGNORECASE)

for dct in data:
    dct['ingredients']= list(filter(lambda x: regex.sub('', x), dct['ingredients']))

但这将返回以下错误：TypeError：sub（）缺少1个必需的位置参数：'string'

我尝试过的其他代码：

for dct in data:
    dct['ingredients']= list(filter(lambda x: x != words, dct['ingredients']))

for dct in data:
    dct['ingredients']=[[el for el in string if el in words ] for string in dct['ingredients']]

for dct in data:
    for string in dct['ingredients']:
        dct['ingredients'] = list(filter(lambda x: x not in words, dct['ingredients']))

但是他们都没有解决我的问题。

Answer 1

为什么list不能与dict基本理解一样？

data = [{k:([' '.join([s for s in x.split() if s not in words]) for x in v] if k == 'ingredients' else v) for k, v in i.items()} for i in data]

Answer 2

在您的re.sub方法中，您应该使用map，而不是filter（您不是要过滤掉单个单词，而是将整个字符串替换为re.sub的结果）

for dct in data:
    dct['ingredients']= list(map(lambda x: regex.sub('', x), dct['ingredients']))

或者，作为列表理解，可能更可读：

    dct['ingredients'] = [regex.sub("", x) for x in dct['ingredients']]

但是，两者都会留下一些多余的空间。如果单词总是用空格隔开，则可以只使用split和join（如果words是set则更快）：

for dct in data:
    dct['ingredients'] = [' '.join(w for w in string.split() if w not in words)
                          for string in dct['ingredients']]

Answer 3

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']}
        ]
for i in data:
    word = ' @! '.join(i['ingredients'])
    for k in words:
        word = word.replace(k,'').strip()

    i['ingredients']=[i.strip() for i in word.split('@!')]

输出

[{'title': 'Simple Enchiladas Verdes',
  'prep_time': '15 min',
  'cook_time': '30 min',
  'ingredients': ['chicken breast',
   'tomato sauce',
   'garlic',
   'fresh  cilantro']}]

Answer 4

words = ['cloves', 'packed']

data = [{'title': 'Simple Enchiladas Verdes',
         'prep_time': '15 min',
         'cook_time': '30 min',
         'ingredients': ['chicken breast', 'tomato sauce', 'garlic cloves', 'fresh packed cilantro']
        },
        {'title': 'Simple Enchiladas Verdes11',
         'prep_time': '15 min11',
         'cook_time': '30 min11',
         'ingredients': ['chicken breast1', '1tomato sauce', '1garlic cloves', '1fresh packed cilantro']}
        ]

n = []
for d in data:
    for item in d['ingredients']:
        for word in words:
            item = item.replace(word, '')
        n.append(item)
    d['ingredients'] = n

print (d)

输出：

{'title': 'Simple Enchiladas Verdes11', 'prep_time': '15 min11', 'cook_time': '30 min11', 'ingredients': ['chicken breast', 'tomato sauce', 'garlic ', 'fresh  cilantro', 'chicken breast1', '1tomato sauce', '1garlic ', '1fresh  cilantro']}

如何从特定词典关键字的值列表中删除单词？

4 个答案: