问题:按工作日和时间从熊猫DatetimeIndex
中进行选择。例如,我想选择星期二20:00和星期五06:00之间的所有项目。
问题:是否有比我下面的解决方案更好的解决方案?
我有一个现有的解决方案(见下文),但由于以下原因,我不太喜欢它:
我的工作示例:
import pandas as pd
from datetime import time
import calendar
# The DatetimeIndex to selection from
idx = pd.date_range('2019-01-01', '2019-01-31', freq='H')
# Converts a datetime to a time-of-day fraction in [0, 1)
def datetime_to_time_frac(t):
return t.hour / 24 + t.minute / (24 * 60) + t.second / (24 * 60 * 60)
# Converts a datetime to a float representing weekday (Monday: 0 to Sunday: 6) + time-of-day fraction in [0, 1)
def datetime_to_weekday_time_frac(t):
return t.weekday + datetime_to_time_frac(t)
# DatetimeIndex converted to float
idx_conv = datetime_to_weekday_time_frac(idx)
# Boolean mask selecting items between Tuesday 20:00 and Friday 06:00
mask = (idx_conv >= calendar.TUESDAY + datetime_to_time_frac(time(20, 0)))\
& (idx_conv <= calendar.FRIDAY + datetime_to_time_frac(time(6, 0)))
# Validation of mask in a pivot table
df = pd.DataFrame(index=idx[mask])
df['Date'] = df.index.date
df['Weekday'] = df.index.weekday
weekdays = list(calendar.day_abbr)
df['WeekdayName'] = df.Weekday.map(lambda x: weekdays[x])
df['Hour'] = df.index.hour
df.pivot_table(index=['Date', 'WeekdayName'], columns='Hour', values='Weekday', aggfunc='count')
最终的透视输出显示代码可以正确地执行操作,但是我感觉有一种更优雅,更惯用的方式来解决此问题。
(代码基于带有最新Pandas的Python 3。)
答案 0 :(得分:0)
似乎您可以使用pandas
中的内部索引功能对其进行更清晰的索引。我避免将时间转换为小数时间,并且可以肯定的是,我所做的工作只能持续整个小时。基本区别是使用熊猫内置功能,并避免将calendars
导入。这是我所做的,大多数情况下都相当于您非常具体的Tues-Fri示例,但是如果您只需要一个小时间隔,则可以将其调整为更通用的情况。
import pandas as pd
idx = pd.date_range('2019-01-01', '2019-01-31', freq='H')
df = pd.DataFrame(index=idx)
# Build a series of filters for each part of your weekly interval.
tues = (df.index.weekday == 1) & (df.index.hour >= 6)
weds_thurs = df.index.weekday.isin([2,3])
fri = (df.index.weekday == 4) & (df.index.hour <= 20)
# The mask is just the union of all those conditions
mask = tues | weds_thurs | fri
# now apply the mask and the rest is basically what you were doing
df = df.loc[mask]
df['Date'] = df.index.date
df['Weekday'] = df.index.weekday
df['WeekdayName'] = df.index.weekday_name
df['Hour'] = df.index.hour
df.pivot_table(index=['Date', 'WeekdayName'], columns='Hour', values='Weekday', aggfunc='count')
答案 1 :(得分:0)
以下内容应能满足您的需求:
def make_date_mask(day_start, time_start, day_end, time_end, series):
flipped = False
if day_start > day_end:
# Need to flip the ordering, then negate at the end
day_start, time_start, day_end, time_end = (
day_end, time_end, day_start, time_start
)
flipped = True
time_start = datetime.strptime(time_start, "%H:%M:%S").time()
time_end = datetime.strptime(time_end, "%H:%M:%S").time()
# Get everything for the specified days, inclusive
mask = series.dt.dayofweek.between(day_start, day_end)
# Filter things that happen before the begining of the start time
# of the start day
mask = mask & ~(
(series.dt.dayofweek == day_start)
& (series.dt.time < time_start)
)
# Filter things that happen after the ending time of the end day
mask = mask & ~(
(series.dt.dayofweek == day_end)
& (series.dt.time > time_end)
)
if flipped:
# Negate the mask to get the actual result and add in the
# times that were exactly on the boundaries, just in case
mask = ~mask | (
(series.dt.dayofweek == day_start)
& (series.dt.time == time_start)
) | (
(series.dt.dayofweek == day_end)
& (series.dt.time == time_end)
)
return mask
在您的示例中使用它:
import pandas as pd
df = pd.DataFrame({
"dates": pd.date_range('2019-01-01', '2019-01-31', freq='H')
})
filtered_df = df[make_date_mask(6, "23:00:00", 0, "00:30:00", df["dates"])]
filtered
如下:
dates
143 2019-01-06 23:00:00
144 2019-01-07 00:00:00
311 2019-01-13 23:00:00
312 2019-01-14 00:00:00
479 2019-01-20 23:00:00
480 2019-01-21 00:00:00
647 2019-01-27 23:00:00
648 2019-01-28 00:00:00