我有类似这样的代码(请参见我的问题的答案):How to parse different string date formats?
代码可以正常工作,但是我不得不将新数据添加到我没有想到的表中(请参见索引10-16):
输入数据:
period signed
0 Q2 '20 Base 01/01/20
1 Q3 '20 Base 01/01/20
2 Q1 '21 Base 01/01/20
3 February '20 Base 01/01/20
4 March '20 Peak 01/01/20
5 Summer 22 Base 01/01/20
6 Winter 20 Peak 01/01/20
7 Summer 21 Base 02/01/20
8 Year 2021 03/01/20
9 October '21 Peak 04/01/20
10 12/03/20 base 05/01/20
11 Week 8 '20 06/01/20
12 Weekend base 07/01/20
13 Monday base 08/01/20
14 BOM base 09/01/20
15 Year 2020 10/01/20
16 12-14 April '20 11/01/20
我想返回我的日期图中的所有内容。但是对于映射中未包括的所有其他字符串(索引10-16),我想将“签名”列中的日期返回到新的4列:1)天2)月3)季度4)年。
这是到目前为止的代码:
datemap = { 'January' : {'day' : 1, 'month' : 1, 'quarter' : 1},
'February' : {'day' : 1, 'month' : 2, 'quarter' : 1},
'March' : {'day' : 1, 'month' : 3, 'quarter' : 1},
# and so on ...
'Spring' : {'day' : 1, 'month' : 1, 'quarter' : 1},
'Summer' : {'day' : 1, 'month' : 4, 'quarter' : 2},
'Fall' : {'day' : 1, 'month' : 7, 'quarter' : 3},
'Winter' : {'day' : 1, 'month' : 10, 'quarter' : 4},
'Q1' : {'day' : 1, 'month' : 1, 'quarter' : 1},
'Q2' : {'day' : 1, 'month' : 4, 'quarter' : 2},
'Q3' : {'day' : 1, 'month' : 7, 'quarter' : 3},
'Q4' : {'day' : 1, 'month' : 10, 'quarter' : 4},
'Year' : {'day' : 1, 'month' : 1, 'quarter' : 1} }
df['day'] = df.apply (lambda r: datemap[r['period'].split()[0]]['day'], axis=1)
df['month'] = df.apply (lambda r: datemap[r['period'].split()[0]]['month'], axis=1)
df['quarter'] = df.apply (lambda r: datemap[r['period'].split()[0]]['quarter'], axis=1)
df['year'] = df.apply (lambda r: "20" + r['period'].split()[1][-2:], axis=1)
输出数据
day month quarter year
0 Q2 '20 Base 01 04 2 2020
1 Q3 '20 Peak 01 07 3 2020
2 Q1 '21 Base 01 01 1 2021
3 February '20 Base 01 02 1 2020
4 March '20 Peak 01 03 1 2020
5 Summer 22 Base 01 04 2 2022
6 Winter 20 Peak 01 10 4 2020
7 Summer 21 Base 01 04 2 2021
8 Year 2021 01 01 1 2021
9 October '21 Base 01 10 4 2021
10 12/03/20 base 05 01 1 2020
11 Week 8 '20 06 01 1 2020
12 Weekend base 07 01 1 2020
13 Monday base 08 01 1 2020
14 BOM base 09 01 1 2020
15 Year 2020 10 01 1 2020
16 12-14 April '20 11 01 1 2020
答案 0 :(得分:1)
您可以通过以下方式进行操作。不再是一个很好的班轮,但它可以工作:
import calendar
def get_datemap_data(row,key,key_datemap):
try:
if key_datemap == "year":
if key in datemap:
return row['period'].split()[1][-2:]
else:
raise ValueError
else:
return datemap[key][key_datemap]
except KeyError:
signed_split = row["signed"].split("/")
map_to_signed = {"day":0,"month":1}
if key_datemap == "quarter":
return datemap[calendar.month_name[int(signed_split[1])]]["quarter"]
return int(signed_split[map_to_signed[key_datemap]])
except ValueError:
signed_split = row["signed"].split("/")
return signed_split[2]
df['day'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'day'), axis=1)
df['month'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'month'), axis=1)
df['quarter'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'quarter'), axis=1)
df['year'] = df.apply (lambda r: "20" + get_datemap_data(r,r['period'].split()[0],'year'), axis=1)
period signed day month quarter year
0 Q2 '20 Base 01/01/20 1 4 2 2020
1 Q3 '20 Base 01/01/20 1 7 3 2020
2 Q1 '21 Base 01/01/20 1 1 1 2021
3 February '20 Base 01/01/20 1 2 1 2020
4 March '20 Peak 01/01/20 1 3 1 2020
5 Summer 22 Base 01/01/20 1 4 2 2022
6 Winter 20 Peak 01/01/20 1 10 4 2020
7 Summer 21 Base 02/01/20 1 4 2 2021
8 Year 2021 03/01/20 1 1 1 2021
9 October '21 Peak 04/01/20 1 10 4 2021
10 12/03/20 base 05/01/20 5 1 1 2020
11 Week 8 '20 06/01/20 6 1 1 2020
12 Weekend base 07/01/20 7 1 1 2020
13 Monday base 08/01/20 8 1 1 2020
14 BOM base 09/01/20 9 1 1 2020
15 Year 2021 10/01/20 1 1 1 2021
16 12-14 April '20 11/01/20 11 1 1 2020
如果您希望所有月份和日期都以前导零开头,则必须将其转换为字符串并添加一个零。
在您想要的输出中有一些错误:
2.a。索引0:应该是第二季度而不是1
2.b。索引8和15:两者相同,但您希望输出不同吗?不可能。我将输出取到索引8。如果希望基于签名,请从日期图中删除条目Year
。