如何在熊猫数据框上进行小时计数

时间:2018-06-28 08:26:03

标签: python pandas dataframe timestamp

我做了多天的观察,几天后就可以观察到一位客户,这是我的数据,

customer_id   value    timestamp
1             1000     2018-05-28 03:40:00.000
1             1450     2018-05-28 04:40:01.000
1             1040     2018-05-28 05:40:00.000
1             1500     2018-05-29 02:40:00.000
1             1090     2018-05-29 04:40:00.000
3             1060     2018-05-18 03:40:00.000
3             1040     2018-05-18 05:40:00.000
3             1520     2018-05-19 03:40:00.000
3             1490     2018-05-19 04:40:00.000

根据先前的问题How do I building dt.hour in 2 days,第一个出现的客户是2018-05-28 03:40:00.000并标记为Day1 - 3,但出于另一个目的,应该是Day1 - 0,因此输出应为< / p>

customer_id   value    timestamp                hour
1             1000     2018-05-28 03:40:00.000  Day1 - 0
1             1450     2018-05-28 04:40:01.000  Day1 - 1
1             1040     2018-05-28 05:40:00.000  Day1 - 2
1             1500     2018-05-29 02:40:00.000  Day1 - 23
1             1090     2018-05-29 04:40:00.000  Day2 - 1
3             1060     2018-05-18 03:40:00.000  Day1 - 0
3             1040     2018-05-18 05:40:00.000  Day1 - 2
3             1520     2018-05-19 03:40:00.000  Day2 - 0
3             1490     2018-05-19 04:40:00.000  Day2 - 1

1 个答案:

答案 0 :(得分:1)

我认为需要为正确的cumcount添加所有错误时间:

#floor to hours
df['timestamp'] = df['timestamp'].dt.floor('h')
#add missing hours per group
df = df.set_index('timestamp').groupby('customer_id').apply(lambda x: x.asfreq('h'))
#cumulative count per group
df['hour'] = df.groupby(level=0).cumcount() 
df= df.dropna(subset=['customer_id']).drop('customer_id', 1).reset_index()

df['hour'] = ('Day' + (df['hour'] // 24).add(1).astype(str) +
              ' - ' + (df['hour'] % 24).astype(str))
print (df) 
   customer_id           timestamp   value       hour
0            1 2018-05-28 03:00:00  1000.0   Day1 - 0
1            1 2018-05-28 04:00:00  1450.0   Day1 - 1
2            1 2018-05-28 05:00:00  1040.0   Day1 - 2
3            1 2018-05-29 02:00:00  1500.0  Day1 - 23
4            1 2018-05-29 04:00:00  1090.0   Day2 - 1
5            3 2018-05-18 03:00:00  1060.0   Day1 - 0
6            3 2018-05-18 05:00:00  1040.0   Day1 - 2
7            3 2018-05-19 03:00:00  1520.0   Day2 - 0
8            3 2018-05-19 04:00:00  1490.0   Day2 - 1