熊猫处理未包含在映射中的其他字符串

时间:2020-01-15 15:00:39

标签: python pandas

我有类似这样的代码(请参见我的问题的答案):How to parse different string date formats?

代码可以正常工作,但是我不得不将新数据添加到我没有想到的表中(请参见索引10-16):

输入数据:

    period             signed
0   Q2 '20 Base       01/01/20
1   Q3 '20 Base       01/01/20
2   Q1 '21 Base       01/01/20
3   February '20 Base 01/01/20
4   March '20 Peak    01/01/20
5   Summer 22 Base    01/01/20
6   Winter 20 Peak    01/01/20
7   Summer 21 Base    02/01/20
8   Year 2021         03/01/20
9   October '21 Peak  04/01/20 
10  12/03/20 base     05/01/20
11  Week 8 '20        06/01/20
12  Weekend base      07/01/20
13  Monday base       08/01/20
14  BOM base          09/01/20
15  Year 2020         10/01/20
16  12-14 April '20   11/01/20

我想返回我的日期图中的所有内容。但是对于映射中未包括的所有其他字符串(索引10-16),我想将“签名”列中的日期返回到新的4列:1)天2)月3)季度4)年。

这是到目前为止的代码:

datemap = { 'January' :  {'day' : 1, 'month' : 1, 'quarter' : 1}, 
            'February' : {'day' : 1, 'month' : 2, 'quarter' : 1}, 
            'March' :    {'day' : 1, 'month' : 3, 'quarter' : 1}, 
            # and so on ...
            'Spring' : {'day' : 1, 'month' : 1, 'quarter' : 1}, 
            'Summer' : {'day' : 1, 'month' : 4, 'quarter' : 2}, 
            'Fall' :   {'day' : 1, 'month' : 7, 'quarter' : 3}, 
            'Winter' : {'day' : 1, 'month' : 10, 'quarter' : 4}, 
            'Q1' : {'day' : 1, 'month' : 1, 'quarter' : 1}, 
            'Q2' : {'day' : 1, 'month' : 4, 'quarter' : 2}, 
            'Q3' : {'day' : 1, 'month' : 7, 'quarter' : 3}, 
            'Q4' : {'day' : 1, 'month' : 10, 'quarter' : 4}, 
            'Year' : {'day' : 1, 'month' : 1, 'quarter' : 1} }

df['day'] = df.apply (lambda r: datemap[r['period'].split()[0]]['day'], axis=1)
df['month'] = df.apply (lambda r: datemap[r['period'].split()[0]]['month'], axis=1)
df['quarter'] = df.apply (lambda r: datemap[r['period'].split()[0]]['quarter'], axis=1)
df['year'] = df.apply (lambda r: "20" + r['period'].split()[1][-2:], axis=1)

输出数据

                     day  month quarter year
0   Q2 '20 Base         01  04    2       2020
1   Q3 '20 Peak         01  07    3       2020
2   Q1 '21 Base         01  01    1       2021
3   February '20 Base   01  02    1       2020
4   March '20 Peak      01  03    1       2020
5   Summer 22 Base      01  04    2       2022
6   Winter 20 Peak      01  10    4       2020
7   Summer 21 Base      01  04    2       2021
8   Year 2021           01  01    1       2021
9   October '21 Base    01  10    4       2021
10  12/03/20 base       05  01    1       2020
11  Week 8 '20          06  01    1       2020
12  Weekend base        07  01    1       2020
13  Monday base         08  01    1       2020
14  BOM base            09  01    1       2020
15  Year 2020           10  01    1       2020
16  12-14 April '20     11  01    1       2020 

1 个答案:

答案 0 :(得分:1)

您可以通过以下方式进行操作。不再是一个很好的班轮,但它可以工作:

import calendar

def get_datemap_data(row,key,key_datemap):
    try:
        if key_datemap == "year":
            if key in datemap:
                return row['period'].split()[1][-2:]
            else:
                raise ValueError
        else:
            return datemap[key][key_datemap]
    except KeyError:
        signed_split = row["signed"].split("/")
        map_to_signed = {"day":0,"month":1}
        if key_datemap == "quarter":
            return datemap[calendar.month_name[int(signed_split[1])]]["quarter"]
        return int(signed_split[map_to_signed[key_datemap]])
    except ValueError:
        signed_split = row["signed"].split("/")
        return signed_split[2]


df['day'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'day'), axis=1)
df['month'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'month'), axis=1)
df['quarter'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'quarter'), axis=1)
df['year'] = df.apply (lambda r: "20" + get_datemap_data(r,r['period'].split()[0],'year'), axis=1)

输出:

               period    signed  day  month  quarter  year
0         Q2 '20 Base  01/01/20    1      4        2  2020
1         Q3 '20 Base  01/01/20    1      7        3  2020
2         Q1 '21 Base  01/01/20    1      1        1  2021
3   February '20 Base  01/01/20    1      2        1  2020
4      March '20 Peak  01/01/20    1      3        1  2020
5      Summer 22 Base  01/01/20    1      4        2  2022
6      Winter 20 Peak  01/01/20    1     10        4  2020
7      Summer 21 Base  02/01/20    1      4        2  2021
8           Year 2021  03/01/20    1      1        1  2021
9    October '21 Peak  04/01/20    1     10        4  2021
10      12/03/20 base  05/01/20    5      1        1  2020
11         Week 8 '20  06/01/20    6      1        1  2020
12       Weekend base  07/01/20    7      1        1  2020
13        Monday base  08/01/20    8      1        1  2020
14           BOM base  09/01/20    9      1        1  2020
15          Year 2021  10/01/20    1      1        1  2021
16    12-14 April '20  11/01/20   11      1        1  2020

一些备注

  1. 如果您希望所有月份和日期都以前导零开头,则必须将其转换为字符串并添加一个零。

  2. 在您想要的输出中有一些错误:

    2.a。索引0:应该是第二季度而不是1

    2.b。索引8和15:两者相同,但您希望输出不同吗?不可能。我将输出取到索引8。如果希望基于签名,请从日期图中删除条目Year

相关问题