将自定义功能应用于熊猫df

时间:2020-02-12 17:18:46

标签: python pandas function dataframe recursion

我已经定义了以下函数,该函数在嵌套字典中搜索特定键的值。

def get_recursively(search_dict, field):

    fields_found = []

    if len(search_dict) == 1:
        search_dict = search_dict[0]

    for key, value in search_dict.items():

        if key == field:
            fields_found.append(value)

        elif isinstance(value, dict):
            results = get_recursively(value, field)
            for result in results:
                fields_found.append(result)

        elif isinstance(value, list):
            for item in value:
                if isinstance(item, dict):
                    more_results = get_recursively(item, field)
                    for another_result in more_results:
                        fields_found.append(another_result)

    return fields_found

现在让我们说我想将此函数应用于pandas df中的一列,并将结果保存到新列中。我的数据如下:

id              metadata                    field
123             {"dek": "fashion...}        frontend
124.            {"dek": "house...}          frontend

我尝试了以下代码:

df['NewCol'] = df.apply(lambda x: get_recursively(x['metadata'], x['field']), axis=1)

因此,在这种情况下,我将传递列metadata和列field的值(即“ frontend”)作为参数。我收到一个错误:KeyError: (0, 'occurred at index 2')当我在一个嵌套的字典中测试我的函数时,该字典存储在一个变量中,它恰好给了我所需的东西,这就是键的值-前端。我在这里做什么错了?

下面提供了我正在处理的嵌套字典的示例:

{"dek": "<p>Don\'t forget to buy a card</p>", "links": {"edit": {"dev": "//patty-menshealth.feature.hearstapps.net/en/content/edit/76517422-96ad-4b5c-a24a-c080c58bce0c", "prod": "//patty-menshealth.prod.hearstapps.com/en/content/edit/76517422-96ad-4b5c-a24a-c080c58bce0c", "stage": "//patty-menshealth.stage.hearstapps.net/en/content/edit/76517422-96ad-4b5c-a24a-c080c58bce0c"}, "frontend": {"dev": "//menshealth.feature.hearstapps.net/trending-news/a19521193/fathers-day-weekend-plans/", "prod": "//www.menshealth.com/trending-news/a19521193/fathers-day-weekend-plans/", "stage": "//menshealth.stage.hearstapps.net/trending-news/a19521193/fathers-day-weekend-plans/"}}, "header": {"title_color": 1, "title_layout": 1}, "sponsor": {"program_type": 1, "tracking_urls": []}, "social_dek": "<p>Don\'t forget to buy a card</p>", "auto_social": 0, "index_title": "\u200bWeekend Guide: Treat Your Dad Right This Father\'s Day", "short_title": "Treat Your Dad Right This Father\'s Day", "social_title": "\u200bWeekend Guide: Treat Your Dad Right This Father\'s Day", "editors_notes": "<p>nid: 2801076<br>created_date: 2017-06-16 13:00:01<br>compass_feed_date: 2017-06-21 14:01:58<br>contract_id: 40</p>", "seo_meta_title": "Treat Your Dad Right This Father\'s Day\u200b | Men’s Health", "social_share_url": "/trending-news/a19521193/fathers-day-weekend-plans/", "seo_related_links": {}, "editor_attribution": "by", "hide_from_homepage": 1, "syndication_rights": 3, "seo_meta_description": "\u200bFrom gifts to food ideas, we\'ve got your Father\'s Day covered. Just don\'t forget to buy him a card."}

1 个答案:

答案 0 :(得分:1)

关键错误似乎是由第一个if语句引起的:

if len(search_dict) == 1:
    search_dict = search_dict[0]

我在使用Dict类型检查时遇到问题,选择尝试使用collections.Mapping而不是dict。我测试了以下解决方案,它似乎可以正常工作。

import collections

def get_recursively(search_dict, field):

    fields_found = []
    for key, value in search_dict.items():

        if key == field:
            fields_found.append(value)

        elif isinstance(value, collections.Mapping):
            results = get_recursively(value, field)
            for result in results:
                fields_found.append(result)

        elif isinstance(value, list):
            for item in value:
                if isinstance(item, dict):
                    more_results = get_recursively(item, field)
                    for another_result in more_results:
                        fields_found.append(another_result)

    return fields_found