如何根据分辨率将已拆分的日期分为两列?

时间:2019-08-13 13:47:54

标签: python python-3.x pandas

我从用户那里输入start_dateend_dateresolution作为输入,我想根据以下分辨率拆分开始和结束日期:

start_date = 2019-03-05 00:00:00
end_date = 2019-03-06 00:00:00
resolution = 15mins

根据分辨率,开始日期和结束日期必须以resolution的间隔进行分割。

我知道可以这样做:

start_date = datetime.strptime(start_date, '%Y-%m-%d %H:%M:%S')
end_date = datetime.strptime(end_date, '%Y-%m-%d %H:%M:%S')
dates = pd.date_range(start_date, end_date, freq = '15T').tolist()
dates = pd.Series(dates)

但这给出了如下结果:

0
2018-01-01 00:00:00
2018-01-01 00:15:00
2018-01-01 00:30:00
2018-01-01 00:45:00
2018-01-01 01:00:00
2018-01-01 01:15:00
2018-01-01 01:30:00

但是我希望将其分成两列,并删除像(-, :)这样的字符,以便如下所示:

Start_time             end_time
201801010000         201801010015
201801010015         201801010030
201801010030         201801010045
201801010045         201801010100
201801010100         201801010115

这怎么办?

2 个答案:

答案 0 :(得分:3)

使用Series.dt.strftime更改date的格式,然后将concatSeries.shift ed列一起使用:

start_date = '2019-03-05 00:00:00'
end_date = '2019-03-06 00:00:00'
#change resolution by removing s
resolution = '15min'

dates = pd.date_range(start_date, end_date, freq = resolution)
dates = pd.Series(dates).dt.strftime('%Y%m%d%H%M')

df = pd.concat([dates,dates.shift(-1)],axis=1, keys=('Start_time','end_time'))
print (df)
      Start_time      end_time
0   201903050000  201903050015
1   201903050015  201903050030
2   201903050030  201903050045
3   201903050045  201903050100
4   201903050100  201903050115
..           ...           ...
92  201903052300  201903052315
93  201903052315  201903052330
94  201903052330  201903052345
95  201903052345  201903060000
96  201903060000           NaN

[97 rows x 2 columns]

如果需要删除最后一行,请添加DataFrame.iloc

df = pd.concat([dates,dates.shift(-1)],axis=1, keys=('Start_time','end_time')).iloc[:-1]
print (df)
      Start_time      end_time
0   201903050000  201903050015
1   201903050015  201903050030
2   201903050030  201903050045
3   201903050045  201903050100
4   201903050100  201903050115
..           ...           ...
91  201903052245  201903052300
92  201903052300  201903052315
93  201903052315  201903052330
94  201903052330  201903052345
95  201903052345  201903060000

[96 rows x 2 columns]

另一个想法是使用DataFrame构造函数,与上面的解决方案的区别是end_time的最后一个值是不同的:

start_date = '2019-03-05 00:00:00'
end_date = '2019-03-06 00:00:00'
resolution = '15min'

dates = pd.date_range(start_date, end_date, freq = resolution)

df = pd.DataFrame({'Start_time':dates.strftime('%Y%m%d%H%M'),
                   'end_time': (dates + pd.to_timedelta(resolution)).strftime('%Y%m%d%H%M')})

print (df)
      Start_time      end_time
0   201903050000  201903050015
1   201903050015  201903050030
2   201903050030  201903050045
3   201903050045  201903050100
4   201903050100  201903050115
..           ...           ...
92  201903052300  201903052315
93  201903052315  201903052330
94  201903052330  201903052345
95  201903052345  201903060000
96  201903060000  201903060015

[97 rows x 2 columns]

答案 1 :(得分:1)

因此您可以只使用shift

dates = pd.Series(dates)

df=pd.concat([dates,dates.shift()],axis=1).dropna()