我从用户那里输入start_date
,end_date
和resolution
作为输入,我想根据以下分辨率拆分开始和结束日期:
start_date = 2019-03-05 00:00:00
end_date = 2019-03-06 00:00:00
resolution = 15mins
根据分辨率,开始日期和结束日期必须以resolution
的间隔进行分割。
我知道可以这样做:
start_date = datetime.strptime(start_date, '%Y-%m-%d %H:%M:%S')
end_date = datetime.strptime(end_date, '%Y-%m-%d %H:%M:%S')
dates = pd.date_range(start_date, end_date, freq = '15T').tolist()
dates = pd.Series(dates)
但这给出了如下结果:
0
2018-01-01 00:00:00
2018-01-01 00:15:00
2018-01-01 00:30:00
2018-01-01 00:45:00
2018-01-01 01:00:00
2018-01-01 01:15:00
2018-01-01 01:30:00
但是我希望将其分成两列,并删除像(-, :
)这样的字符,以便如下所示:
Start_time end_time
201801010000 201801010015
201801010015 201801010030
201801010030 201801010045
201801010045 201801010100
201801010100 201801010115
这怎么办?
答案 0 :(得分:3)
使用Series.dt.strftime
更改date
的格式,然后将concat
与Series.shift
ed列一起使用:
start_date = '2019-03-05 00:00:00'
end_date = '2019-03-06 00:00:00'
#change resolution by removing s
resolution = '15min'
dates = pd.date_range(start_date, end_date, freq = resolution)
dates = pd.Series(dates).dt.strftime('%Y%m%d%H%M')
df = pd.concat([dates,dates.shift(-1)],axis=1, keys=('Start_time','end_time'))
print (df)
Start_time end_time
0 201903050000 201903050015
1 201903050015 201903050030
2 201903050030 201903050045
3 201903050045 201903050100
4 201903050100 201903050115
.. ... ...
92 201903052300 201903052315
93 201903052315 201903052330
94 201903052330 201903052345
95 201903052345 201903060000
96 201903060000 NaN
[97 rows x 2 columns]
如果需要删除最后一行,请添加DataFrame.iloc
:
df = pd.concat([dates,dates.shift(-1)],axis=1, keys=('Start_time','end_time')).iloc[:-1]
print (df)
Start_time end_time
0 201903050000 201903050015
1 201903050015 201903050030
2 201903050030 201903050045
3 201903050045 201903050100
4 201903050100 201903050115
.. ... ...
91 201903052245 201903052300
92 201903052300 201903052315
93 201903052315 201903052330
94 201903052330 201903052345
95 201903052345 201903060000
[96 rows x 2 columns]
另一个想法是使用DataFrame
构造函数,与上面的解决方案的区别是end_time
的最后一个值是不同的:
start_date = '2019-03-05 00:00:00'
end_date = '2019-03-06 00:00:00'
resolution = '15min'
dates = pd.date_range(start_date, end_date, freq = resolution)
df = pd.DataFrame({'Start_time':dates.strftime('%Y%m%d%H%M'),
'end_time': (dates + pd.to_timedelta(resolution)).strftime('%Y%m%d%H%M')})
print (df)
Start_time end_time
0 201903050000 201903050015
1 201903050015 201903050030
2 201903050030 201903050045
3 201903050045 201903050100
4 201903050100 201903050115
.. ... ...
92 201903052300 201903052315
93 201903052315 201903052330
94 201903052330 201903052345
95 201903052345 201903060000
96 201903060000 201903060015
[97 rows x 2 columns]
答案 1 :(得分:1)
因此您可以只使用shift
dates = pd.Series(dates)
df=pd.concat([dates,dates.shift()],axis=1).dropna()