以下是以新格式更新日期的SQL查询
update data set Date=[Time Period]+'-01-01' where Frequency='0'
update data set Date=replace([Time Period],'Q1','-01-01')
where Frequency='2' and substring([Time Period],5,2)='Q1'
update data set Date=replace([Time Period],'Q2','-04-01')
where Frequency='2' and substring([Time Period],5,2)='Q2'
update data set Date=replace([Time Period],'Q3','-07-01')
where Frequency='2' and substring([Time Period],5,2)='Q3'
update data set Date=replace([Time Period],'Q4','-10-01')
where Frequency='2' and substring([Time Period],5,2)='Q4'
update data set Date=replace([Time Period],'M','-')+'-01'
where Frequency='3' and len([Time Period])=7
update data set Date=replace([Time Period],'M','-0')+'-01'
where Frequency='3' and len([Time Period])=6
现在我已经将相同的数据加载到python数据框中,
以逗号分隔的数据框中的示例数据。 列:时间段是输入数据,日期列是输出日期,我需要将时间段转换为列日期格式。
Frequency,Time Period,Date
0,2008,2008-01-01
0,1961,1961-01-01
2,2009Q1,2009-04-01
2,1975Q4,1975-10-01
2,2007Q3,2007-04-01
2,1959Q4,1959-10-01
2,1965Q4,1965-07-01
2,2008Q3,2008-07-01
3,1969M2,1969-02-01
3,1994M12,1994-12-01
3,1990M1,1990-01-01
3,1994M10,1994-10-01
3,2012M11,2012-11-01
3,1994M3,1994-03-01
请告诉我如何在python中按照上述条件更新日期。
答案 0 :(得分:0)
在添加不同的偏移量时,使用矢量化的apparoach 有点棘手。
考虑以下方法:
来源DF:
In [337]: df
Out[337]:
Frequency Time Period
0 0 2008
1 0 1961
2 2 2009Q1
3 2 1975Q4
4 2 2007Q3
5 2 1959Q4
6 2 1965Q4
7 2 2008Q3
8 3 1969M2
9 3 1994M12
10 3 1990M1
11 3 1994M10
12 3 2012M11
13 3 1994M3
<强>解决方案:强>
In [338]: %paste
df[['y','mm']] = (df['Time Period']
.replace(['Q1', 'Q2', 'Q3', 'Q4'],
['M0', 'M3', 'M6', 'M9'],
regex=True)
.str.extract('(\d{4})M?(\d+)?', expand=True))
df['Date'] = (pd.to_datetime(df.pop('y'), format='%Y', errors='coerce')
.values.astype('M8[M]') \
+ \
pd.to_numeric(df.pop('mm'), errors='coerce') \
.fillna(0).astype(int).values * np.timedelta64(1, 'M')) \
.astype('M8[D]')
## -- End pasted text --
<强>结果:强>
In [339]: df
Out[339]:
Frequency Time Period Date
0 0 2008 2008-01-01
1 0 1961 1961-01-01
2 2 2009Q1 2009-01-01
3 2 1975Q4 1975-10-01
4 2 2007Q3 2007-07-01
5 2 1959Q4 1959-10-01
6 2 1965Q4 1965-10-01
7 2 2008Q3 2008-07-01
8 3 1969M2 1969-03-01
9 3 1994M12 1995-01-01
10 3 1990M1 1990-02-01
11 3 1994M10 1994-11-01
12 3 2012M11 2012-12-01
13 3 1994M3 1994-04-01
df[['y','mm']] = (df['Period']
.replace(['Q1', 'Q2', 'Q3', 'Q4'],
['M1', 'M4', 'M7', 'M10'],
regex=True)
.str.extract('(\d{4})M?(\d+)?', expand=True))
df['Date'] = (pd.to_datetime(df.pop('y'), format='%Y', errors='coerce')
.values.astype('M8[M]') \
+ \
pd.to_numeric(df.pop('mm'), errors='coerce') \
.fillna(1).astype(int).values - 1 * np.timedelta64(1, 'M')) \
.astype('M8[D]')
输出:
Frequency Time Period Date
0 0 0 2008 2008-01-01
1 1 0 1961 1961-01-01
2 2 2 2009Q1 2009-01-01
3 3 2 1975Q4 1975-10-01
4 4 2 2007Q3 2007-07-01
5 5 2 1959Q4 1959-10-01
6 6 2 1965Q4 1965-10-01
7 7 2 2008Q3 2008-07-01
8 8 3 1969M2 1969-02-01
9 9 3 1994M12 1994-12-01
10 10 3 1990M1 1990-01-01
11 11 3 1994M10 1994-10-01
12 12 3 2012M11 2012-11-01
13 13 3 1994M3 1994-03-01