我有一本字典,其ID是配方ID,值是成分列表:
recipe_dictionary = { 134: ['salt', 'chicken', 'tomato paste canned'],
523: ['toast whole grain', 'feta cheese' 'egg', 'salt'],
12: ['chicken', 'rice', 'parsley']}
我还有一个静态列表,其中包含我白天不想重复的成分:
non_repeatable_ingredients = ['egg', 'chicken', 'beef']
现在,我遍历字典的每个值,然后遍历成分名称,将每个名称与 non_repeatable_ingredients 列表进行比较,然后创建共享单词的列表。因此,我的缩小字典将如下所示:
reduced_recipe_dictionary = { 134: ['chicken'],
523, ['egg'],
12: ['chicken']
此过程需要很长时间,因为我的实际字典和成分列表很长。是否有比以下方法更快的方法?
这是 get_reduced_meal_plans_dictionry 方法:
reduced_meal_plans_dictionary = {}
# For each recipe
for recipe in meal_plans_dictionary:
# Temp list for overlapp ingredients found for each recipe
overlapped_ingredients_list = []
# For each complete name of ingredient in the recipe
for ingredient_complete_name in meal_plans_dictionary[recipe]:
# Clean up the ingredient name as it sometimes involves comma, parentheses or spaces
ingredient_string = ingredient_complete_name.replace(',', '').replace('(', '').replace(')', '').lower().strip()
# Compare each ingredient name against the list of ingredients that shall not repeated in a day
for each in PROTEIN_TAGS:
# Compute the partial similarity
partial_similarity = fuzz.partial_ratio(ingredient_string, each.lower())
# If above 90, means one of the ingredients in the PROTEIN_TAGS exists in this recipe
if partial_similarity > 90:
# Make a list of such ingredients for this recipe
overlapped_ingredients_list.append(each.lower())
# Place the recipe ID as the key and the reduced overlapped list as the value
reduced_meal_plans_dictionary[recipe] = overlapped_ingredients_list
我使用替换率和相似率是因为成分名称并不干净。例如,我可以将鸡蛋或煮鸡蛋作为一种成分。
谢谢。
答案 0 :(得分:1)
由于每个食谱都有独特的成分并且顺序无关紧要,所以使用集合而不是列表怎么样?
可以在O(1)个恒定时间内搜索集合,而列表可以在O(n)个恒定时间内搜索。
例如:
recipe_dictionary = {
134: set(['salt', 'chicken', 'tomato paste canned']),
523: set(['toast whole grain', 'feta cheese' 'egg', 'salt']),
12: set(['chicken', 'rice', 'parsley'])
}
non_repeatable_ingredients = set(['egg', 'chicken', 'beef'])
您可以在这样的集合中测试元素的存在:
for ingredient in recipe_dictionary[134]:
if ingredient in non_repeatable_ingredients:
# do something
答案 1 :(得分:0)
使用正则表达式和defaultdict的组合,您可以确切地找到所需的内容。这种方法使用正则表达式来减少所需的for
循环数。
请注意,我已经调整了键12
以表明它将同时获得两个匹配项。
recipe_dictionary = { 134: ['salt', 'chicken', 'tomato paste canned'],
523: ['toast whole grain', 'feta cheese', 'egg', 'salt'],
12: ['whole chicken', 'rice', 'parsley', 'egg']}
non_repeatable_ingredients = ['egg', 'chicken', 'beef']
non_repeat = '(' + '|'.join(non_repeatable_ingredients) + ')'
d = defaultdict(list)
for k, j in recipe_dictionary.items():
for i in j:
m = re.search(non_repeat, i)
if m:
d[k].append(m.groups()[0])
d
defaultdict(list, {134: ['chicken'], 523: ['egg'], 12: ['chicken', 'egg']})
答案 2 :(得分:0)
>>> reduced_recipe_dictionary = {k: list(filter(lambda x: x in non_repeatable_ingredients, v)) for k,v in recipe_dictionary.items()}
>>> reduced_recipe_dictionary
{134: ['chicken'], 523: ['egg'], 12: ['egg']}
>>>
如果您没有与non_repeatable_ingredients
列表中的项目匹配的干净配料,则可以使用fuzz.partial_ratio
模块中的fuzzywuzzy
来获得最匹配的配料(比率更大的配料)比说80%)。手动进行pip install fuzzywuzzy
安装
>>> from fuzzywuzzy import fuzz
>>> reduced_recipe_dictionary = {k: list(filter(lambda x: fuzz.partial_ratio(v,x) >80, non_repeatable_ingredients)) for k,v in recipe_dictionary.items()}
>>> reduced_recipe_dictionary
{134: ['chicken'], 523: ['egg'], 12: ['chicken']}