Question

这是我拥有的示例数据集的代码

data={'ID':[4,4,4,4,22,22,23,25,29],
      'Zone':[32,34,21,34,27,29,32,75,9],
  'checkin_datetime':['04-01-2019 13:07','04-01-2019 13:09','04-01-2019 14:06','04-01-2019 14:55','04-01-2019 20:23'
  ,'04-01-2019 21:38','04-01-2019 21:38','04-01-2019 23:22','04-02-2019 01:00'],
  'checkout_datetime':['04-01-2019 13:09','04-01-2019 13:12','04-01-2019 14:07','04-01-2019 15:06','04-01-2019 21:32'
                       ,'04-01-2019 21:42','04-01-2019 21:45','04-02-2019 00:23','04-02-2019 06:15']
}

df = pd.DataFrame(data,columns= ['ID','Zone', 'checkin_datetime','checkout_datetime'])

df['checkout_datetime'] = pd.to_datetime(df['checkout_datetime'])
df['checkin_datetime'] = pd.to_datetime(df['checkin_datetime'])

使用此数据集，我试图创建以下数据集

                Checked_in_hour    ID    Zone    checked_in_minutes
                01-04-2019 13:00    4    32        2
                01-04-2019 13:00    4    34        3
                01-04-2019 14:00    4    21        1
                01-04-2019 14:00    4    34        5
                01-04-2019 15:00    4    34        6
                01-04-2019 20:00    22    27       37
                01-04-2019 20:00    22    27       8
                01-04-2019 20:00    22    27       37
                01-04-2019 21:00    22    29       4
                01-04-2019 21:00    23    32       7
                01-04-2019 23:00    25    75       38
                02-04-2019 00:00    25    75       24
                02-04-2019 01:00    29    9        60
                02-04-2019 02:00    29    9        60
                02-04-2019 03:00    29    9        60
                02-04-2019 04:00    29    9        60
                02-04-2019 05:00    29    9        60
                02-04-2019 06:00    29    9        16

签入小时数是通过减去checkin_datetime和checkout_datetime来计算的，并且该时间按小时和时区分组

到目前为止，这是我在Checked_in_hour级别进行计算的代码，需要在Zone Variable中添加

#working logic
df2 = pd.DataFrame(
index=pd.DatetimeIndex(
    start=df['checkin_datetime'].min(),
    end=df['checkout_datetime'].max(),freq='1T'),
    columns = ['is_checked_in','ID'], data=0)

for index, row in df.iterrows():
    df2['is_checked_in'][row['checkin_datetime']:row['checkout_datetime']] = 1
    df2['ID'][row['checkin_datetime']:row['checkout_datetime']] = row['ID']

df3 = df2.resample('1H').aggregate({'is_checked_in': sum,'ID':max})

Answer 1

不确定这是否有效，但应该可以。

import pandas as pd
from datetime import timedelta

def group_into_hourly_buckets(df):
    df['duration'] = df['checkout_datetime'] - df['checkin_datetime']
    grouped_data = []
    for idx, row in df.iterrows():
        dur = row['duration'].seconds//60
        start_time = row['checkin_datetime']
        hours_ = 0
        while dur > 0:
            _data = {}
            _data['Checked_in_hour'] = start_time.floor('H') + timedelta(hours=hours_)
            time_spent_in_window = min(dur, 60)
            if (hours_ == 0):
                time_spent_in_window = min(time_spent_in_window, ((start_time.ceil('H') - start_time).seconds)//60)
            _data['checked_in_minutes'] = time_spent_in_window
            _data['ID'] = row['ID']
            _data['Zone'] = row['Zone']
            dur -= time_spent_in_window
            hours_ += 1
            grouped_data.append(_data)
    return pd.DataFrame(grouped_data)

在Python中将具有多个索引的时间间隔划分为每小时的时段

1 个答案: