我有一个包含下表的文件:
Name AvailableDate totalRemaining
0 X3321 2018-03-14 13:00:00 200
1 X3321 2018-03-14 14:00:00 200
2 X3321 2018-03-14 15:00:00 200
3 X3321 2018-03-14 16:00:00 200
4 X3321 2018-03-14 17:00:00 193
我希望返回一个包含特定时间期间所有记录的DataFrame,而不管实际的日期。
我跟着这个例子:
filter pandas dataframe by time
但是当我执行以下内容时:
## setup
import pandas as pd
import numpy as np
### Step 2
### Check available slots
file2 = r'C:\Users\user\Desktop\Files\data.xlsx'
slots = pd.read_excel(file2,na_values='')
## filter the preffered ones
slots['nextAvailableDate'] = pd.to_datetime((slots['nextAvailableDate']))
slots['times'] = pd.to_datetime((slots['nextAvailableDate']))
slots = slots[slots['times'].between('21:00:00', '02:00:00')]
这将返回空的DataFrame以及此解决方案:
slots = slots[slots['times'].dt.strftime('%H:%M:%S').between('21:00:00', '02:00:00')]
有没有办法正确地执行它而不分别创建时间列?我该如何解决这个问题?
我的目标:
Name AvailableDate totalRemaining
0 X3321 2018-03-14 21:00:00 200
1 X3321 2018-03-14 22:00:00 200
2 X3321 2018-03-14 23:00:00 200
3 X3321 2018-03-14 00:00:00 200
4 X3321 2018-03-14 01:00:00 193
表示数据集中出现的每一天。
答案 0 :(得分:4)
我认为需要between_time
使用由set_index
创建的Datetimeindex
,以便为reset_index
添加reindex
列以获得相同的列顺序:
print (slots)
Name AvailableDate totalRemaining
0 X3321 2018-03-14 21:00:00 200
1 X3321 2018-03-14 20:00:00 200
2 X3321 2018-03-14 22:00:00 200
3 X3321 2018-03-14 23:00:00 200
4 X3321 2018-03-14 00:00:00 200
5 X3321 2018-03-14 01:00:00 193
6 X3321 2018-03-14 13:00:00 200
7 X3321 2018-03-14 14:00:00 200
8 X3321 2018-03-14 15:00:00 200
9 X3321 2018-03-14 16:00:00 200
10 X3321 2018-03-14 17:00:00 193
slots['AvailableDate'] = pd.to_datetime(slots['AvailableDate'])
df = (slots.set_index('AvailableDate')
.between_time('21:00:00', '02:00:00')
.reset_index()
.reindex(columns=df.columns))
print (df)
AvailableDate Name totalRemaining
0 2018-03-14 21:00:00 X3321 200
1 2018-03-14 22:00:00 X3321 200
2 2018-03-14 23:00:00 X3321 200
3 2018-03-14 00:00:00 X3321 200
4 2018-03-14 01:00:00 X3321 193
答案 1 :(得分:2)
您可以将pd.Series.between
与datetime
个对象一起使用,如下所示。
from datetime import datetime
start = datetime.strptime('21:00:00', '%H:%M:%S').time()
end = datetime.strptime('02:00:00', '%H:%M:%S').time()
slots = slots[slots['times'].dt.time.between(start, end)]
使用示例
from datetime import datetime
import pandas as pd
slots = pd.DataFrame({'times': ['2018-03-08 05:00:00', '2018-03-08 07:00:00',
'2018-03-08 01:00:00', '2018-03-08 20:00:00',
'2018-03-08 22:00:00', '2018-03-08 23:00:00']})
start = datetime.strptime('21:00:00', '%H:%M:%S').time()
end = datetime.strptime('23:30:00', '%H:%M:%S').time()
slots = slots[slots['times'].dt.time.between(start, end)]
# times
# 4 2018-03-08 22:00:00
# 5 2018-03-08 23:00:00