我想根据预订的进入日期和退出日期来找出按天汇总的停车场总数量。
Booking_Date Entry_date Exit_date
0 2017-01-09 08:22:23 2017-12-22 03:00:00 2018-01-05 19:00:00
1 2017-01-09 13:54:08 2017-12-12 11:30:00 2018-01-04 12:30:00
2 2017-01-11 21:37:08 2017-12-29 08:00:00 2018-01-01 23:00:00
3 2017-01-15 15:20:12 2017-12-21 12:00:00 2018-01-08 07:00:00
4 2017-01-18 00:00:00 2017-12-23 05:25:00 2018-01-06 13:25:00
5 2017-01-19 04:33:13 2017-12-27 08:00:00 2018-01-03 17:00:00
6 2017-01-19 10:20:00 2017-12-29 06:00:00 2018-01-05 21:00:00
7 2017-01-22 00:20:46 2017-12-31 06:00:00 2018-01-05 08:00:00
8 2017-01-22 19:51:10 2017-12-02 06:00:00 2018-01-02 13:00:00
9 2017-01-23 11:17:34 2018-01-04 06:30:00 2018-01-04 07:00:00
10 2017-01-23 14:43:56 2018-01-09 06:30:00 2018-01-16 07:00:00
11 2017-01-24 12:38:41 2017-12-19 12:00:00 2018-01-10 23:00:00
12 2017-01-26 10:05:01 2017-12-30 05:30:00 2018-01-06 15:00:00
13 2017-01-26 10:05:01 2017-12-30 05:30:00 2018-01-06 15:00:00
14 2017-01-26 10:05:01 2018-01-02 04:30:00 2018-01-06 15:00:00
15 2017-01-25 14:02:20 2017-12-31 06:00:00 2018-01-14 20:00:00
16 2017-01-28 14:22:15 2017-12-22 06:00:00 2018-01-04 10:00:00
17 2017-01-28 16:23:51 2017-12-30 07:00:00 2018-01-02 14:00:00
18 2017-01-29 16:18:27 2017-12-21 08:00:00 2018-01-02 09:30:00
19 2017-01-29 18:20:17 2017-12-28 06:00:00 2018-01-04 21:00:00
我的完整数据集已有60万笔预订-因此,我正在努力使其尽可能高效
import time
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500) ##Terminal view sizing
pd.set_option('display.width', 1000)
start = time.perf_counter()
booking_date = ['2017-01-09 08:22:23',
'2017-01-09 13:54:08',
'2017-01-11 21:37:08',
'2017-01-15 15:20:12',
'2017-01-18 00:00:00',
'2017-01-19 04:33:13',
'2017-01-19 10:20:00',
'2017-01-22 00:20:46',
'2017-01-22 19:51:10',
'2017-01-23 11:17:34',
'2017-01-23 14:43:56',
'2017-01-24 12:38:41',
'2017-01-26 10:05:01',
'2017-01-26 10:05:01',
'2017-01-26 10:05:01',
'2017-01-25 14:02:20',
'2017-01-28 14:22:15',
'2017-01-28 16:23:51',
'2017-01-29 16:18:27',
'2017-01-29 18:20:17']
entry_date = ['2017-12-22 03:00:00',
'2017-12-12 11:30:00',
'2017-12-29 08:00:00',
'2017-12-21 12:00:00',
'2017-12-23 05:25:00',
'2017-12-27 08:00:00',
'2017-12-29 06:00:00',
'2017-12-31 06:00:00',
'2017-12-02 06:00:00',
'2018-01-04 06:30:00',
'2018-01-09 06:30:00',
'2017-12-19 12:00:00',
'2017-12-30 05:30:00',
'2017-12-30 05:30:00',
'2018-01-02 04:30:00',
'2017-12-31 06:00:00',
'2017-12-22 06:00:00',
'2017-12-30 07:00:00',
'2017-12-21 08:00:00',
'2017-12-28 06:00:00']
exit_date = ['2018-01-05 19:00:00',
'2018-01-04 12:30:00',
'2018-01-01 23:00:00',
'2018-01-08 07:00:00',
'2018-01-06 13:25:00',
'2018-01-03 17:00:00',
'2018-01-05 21:00:00',
'2018-01-05 08:00:00',
'2018-01-02 13:00:00',
'2018-01-04 07:00:00',
'2018-01-16 07:00:00',
'2018-01-10 23:00:00',
'2018-01-06 15:00:00',
'2018-01-06 15:00:00',
'2018-01-06 15:00:00',
'2018-01-14 20:00:00',
'2018-01-04 10:00:00',
'2018-01-02 14:00:00',
'2018-01-02 09:30:00',
'2018-01-04 21:00:00',]
data ={'Booking_Date':booking_date,'Entry_date': entry_date, 'Exit_date': exit_date}
Exit_df = pd.DataFrame(data, columns=['Booking_Date','Entry_date','Exit_date']) # Booking dataframe; Entry_date , Exit_date, Booking cost ect
Exit_df['Booking_Date'] = pd.to_datetime(Exit_df['Booking_Date'])
Exit_df['Entry_date'] = pd.to_datetime(Exit_df['Entry_date']) ### formatting data types
Exit_df['Exit_date'] = pd.to_datetime(Exit_df['Exit_date'])
print(Exit_df.head(20))
start_date = pd.to_datetime('2017-12-10') #.date() #Exit_df['Entry_date'].min
end_date = pd.to_datetime('2018-01-12') #.date() #Exit_df['Exit_date'].max
print(start_date,end_date)
occupancy_df = pd.DataFrame(pd.date_range(start_date,end_date,))
occupancy_df.columns = ['occupancy date']
print(occupancy_df)
occupancy date
0 2017-12-10
1 2017-12-11
2 2017-12-12
3 2017-12-13
4 2017-12-14
5 2017-12-15
6 2017-12-16
7 2017-12-17
8 2017-12-18
9 2017-12-19
10 2017-12-20
11 2017-12-21
12 2017-12-22
13 2017-12-23
14 2017-12-24
15 2017-12-25
16 2017-12-26
17 2017-12-27
18 2017-12-28
19 2017-12-29
20 2017-12-30
21 2017-12-31
22 2018-01-01
23 2018-01-02
24 2018-01-03
25 2018-01-04
26 2018-01-05
27 2018-01-06
28 2018-01-07
29 2018-01-08
30 2018-01-09
31 2018-01-10
32 2018-01-11
33 2018-01-12
我希望输出是另一个数据框,该数据框具有该日期的日期列和该日期的一列占用空间。
谢谢