熊猫基于多列提取值

时间:2020-05-10 16:44:21

标签: python function dictionary

我想根据我在pandas数据框中称为“ signed”和“ period”的列来映射值。我的规则基于列“ period”

  • 如果值包含月份和季度,我想返回它们的日期
  • 如果值包含冬天或夏天,我也想返回它们的日期,但夏天是4月,冬天是10月
  • 对于其他所有内容,我想返回“已签名”列中的日期

我要根据我的输出数据插入4个新列

I have input data:

    period          signed
    May-20          16/05/2020
    August-20       10/05/2020
    Q2-20           14/05/2020
    Q3-20           21/05/2020
    10/05/2020      11/05/2020
    Summer 21       18/05/2020
    Winter 22       19/05/2020
    Weekend         20/05/2020
    week 15-20      21/05/2020

我想要这个输出

    period          signed       day  month quarter year
    May-20          16/05/2020    1    5      2     2020
    August-20       10/05/2020    1    8      3     2020
    Q2-20           14/05/2020    1    4      2     2020
    Q3-21           21/05/2020    1    7      3     2021
    10/05/2020      11/05/2020    10   5      2     2020
    Summer 21       18/05/2020    1    4      2     2021
    Winter 22       19/05/2020    1    10     4     2022    
    Weekend         20/05/2020    20   5      2     2020
    week 15-20      21/05/2020    21   5      2     2020

这是我正在尝试实现的代码,但是它只返回不符合我的规则的'signed'中的值

      datemap = { 'January' :  {'day' : 1, 'month' : 1, 'quarter' : 1}, 
                    'February' : {'day' : 1, 'month' : 2, 'quarter' : 1}, 
                    'March' :    {'day' : 1, 'month' : 3, 'quarter' : 1}, 
                    # and so on ...
                    'Spring' : {'day' : 1, 'month' : 1, 'quarter' : 1}, 
                    'Summer' : {'day' : 1, 'month' : 4, 'quarter' : 2}, 
                    'Fall' :   {'day' : 1, 'month' : 7, 'quarter' : 3}, 
                    'Winter' : {'day' : 1, 'month' : 10, 'quarter' : 4}, 
                    'Q1' : {'day' : 1, 'month' : 1, 'quarter' : 1}, 
                    'Q2' : {'day' : 1, 'month' : 4, 'quarter' : 2}, 
                    'Q3' : {'day' : 1, 'month' : 7, 'quarter' : 3}, 
                    'Q4' : {'day' : 1, 'month' : 10, 'quarter' : 4}, 
                    'Year' : {'day' : 1, 'month' : 1, 'quarter' : 1} }

    import calendar

    def get_datemap_data(row,key,key_datemap):
        try:
            if key_datemap == "year":
                if key in datemap:
                    return row['period'].split()[1][-2:]
                else:
                    raise ValueError
            else:
                return datemap[key][key_datemap]
        except KeyError:
            signed_split = row["signed"].split("/")
            map_to_signed = {"day":0,"month":1}
            if key_datemap == "quarter":
                return datemap[calendar.month_name[int(signed_split[1])]]["quarter"]
            return int(signed_split[map_to_signed[key_datemap]])
        except ValueError:
            signed_split = row["signed"].split("/")
            return signed_split[2]


    df['day'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'day'), axis=1)
    df['month'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'month'), axis=1)
    df['quarter'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'quarter'), axis=1)
    df['year'] = df.apply (lambda r: "20" + get_datemap_data(r,r['period'].split()[0],'year'), axis=1)

0 个答案:

没有答案