假设我有一家比萨店的以下日志:
import pandas as pd
csv = [
['2019-05-01', '2019-05-01 18:30', 'pepperoni', 'small'],
['2019-05-01', '2019-05-01 21:00', 'pineapple', 'big'],
['2019-05-01', '2019-05-01 22:30', 'pepperoni', 'big'],
['2019-05-02', '2019-05-02 19:00', 'pineapple', 'small'],
['2019-05-02', '2019-05-02 20:30', 'pineapple', 'big'],
['2019-05-02', '2019-05-02 23:00', 'pepperoni', 'small']]
df = pd.DataFrame(csv, columns=["Working day", "Time of order", "Pizza type", "Pizza size"])
df["Working day"] = (pd.to_datetime(df["Working day"]))
df["Time of order"] = (pd.to_datetime(df["Time of order"]))
df = df.set_index(['Working day','Time of order'])
现在我有一个multindex数据框,我想进行一些分析。为此,我想基于将某些条件应用于第二个索引(订购时间)或其他列的第一个索引(工作日)来构建时间序列。
例如,一些所需的输出:
每天,最接近19:00:00的订单
Pizza type Pizza size
Working day Time of order
2019-05-01 2019-05-01 18:30:00 pepperoni small
2019-05-02 2019-05-02 19:00:00 pineapple small
每天,19:00:00之后的第一笔订单
Pizza type Pizza size
Working day Time of order
2019-05-01 2019-05-01 21:00:00 pineapple big
2019-05-02 2019-05-02 19:00:00 pineapple small
每天,最新订购的披萨尺寸大:
Pizza type Pizza size
Working day Time of order
2019-05-01 2019-05-01 22:30:00 pepperoni big
2019-05-02 2019-05-02 20:30:00 pineapple big
每天,在22:30:00下订单
Pizza type Pizza size
Working day Time of order
2019-05-01 2019-05-01 22:30:00 pepperoni big
2019-05-02 NaT NaN NaN
以此类推。我该怎么做?
答案 0 :(得分:0)
不使用多索引,而是尝试直接将差异应用于Time of order
列:
### Skipping the `df = df.set_index(['Working day','Time of order'])` step:
# Calculate difference to 19:00 by seconds
df['time_difference'] = (df['Time of order'] - pd.to_datetime('19:00')).dt.seconds
.dt
方法可用于从pandas datetime
对象(https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dt-accessors)中提取信息
计算出差额之后,就可以使用新的time_difference
列来回答一些特定的问题。