我的多索引熊猫枢轴df看起来像这样:
Date 2019-10-01 11:00 2019-10-01 12:00 2019-10-01 13:00 ... 2019-10-29 17:00
ID 25 24 25 ... 24
H_name
Hospital1 12 15 16 ... 12
Hospital2 10 17 14 ... 12
Hospital3 15 20 12 ... 12
我想得到:
Date 2019-10-01 2019-10-02 2019-10-03
ID 25.45 24.33 23.71
H_name
Hospital1 253 287 261
Hospital2 212 232 264
Hospital3 221 219 223
“ H_name”的值是一天中所有时段的总和,“ ID”是一天中所有时段的平均值。谢谢您的帮助=)
我在枢纽之前的df
H_name Date ID Value
0 Hospital1 2019-10-01 11:00 25 12
1 Hospital2 2019-10-01 11:00 25 10
2 Hospital3 2019-10-01 11:00 25 15
3 Hospital1 2019-10-01 12:00 24 15
4 Hospital2 2019-10-01 12:00 24 17
5 Hospital3 2019-10-01 12:00 24 20
.... .... ... ...
680 Hospital1 2019-10-30 15:00 20 11
681 Hospital2 2019-10-30 15:00 20 18
682 Hospital3 2019-10-30 15:00 20 17
答案 0 :(得分:0)
如果我对您的理解正确,则希望按日期对数据进行分组(Value
按np.sum
分组,ID
按np.mean
分组),然后再创建数据透视表:
import numpy as np
import pandas as pd
h_name = ['Hospital1', 'Hospital2', 'Hospital3', 'Hospital1', 'Hospital2', 'Hospital3',
'Hospital1', 'Hospital2', 'Hospital3', 'Hospital1', 'Hospital2', 'Hospital3']
date = ['2019-10-01 11:00', '2019-10-01 11:00', '2019-10-01 11:00', '2019-10-01 12:00', '2019-10-01 12:00', '2019-10-01 12:00',
'2019-10-02 11:00', '2019-10-02 11:00', '2019-10-02 11:00', '2019-10-02 12:00', '2019-10-02 12:00', '2019-10-02 12:00']
ids = [25, 25, 25, 24, 24, 24,
23, 23, 23, 22, 22, 22]
value = [12, 10, 15, 15, 17, 20,
15, 16, 17, 14, 13, 22]
df = pd.DataFrame({'H_name': h_name, 'Date': date, 'ID': ids, 'Value': value})
df['Date'] = pd.to_datetime(df['Date'], utc=False)
print(df)
df
中的数据如下:
H_name Date ID Value
0 Hospital1 2019-10-01 11:00:00 25 12
1 Hospital2 2019-10-01 11:00:00 25 10
2 Hospital3 2019-10-01 11:00:00 25 15
3 Hospital1 2019-10-01 12:00:00 24 15
4 Hospital2 2019-10-01 12:00:00 24 17
5 Hospital3 2019-10-01 12:00:00 24 20
6 Hospital1 2019-10-02 11:00:00 23 15
7 Hospital2 2019-10-02 11:00:00 23 16
8 Hospital3 2019-10-02 11:00:00 23 17
9 Hospital1 2019-10-02 12:00:00 22 14
10 Hospital2 2019-10-02 12:00:00 22 13
11 Hospital3 2019-10-02 12:00:00 22 22
然后:
df['Date_1'] = df.Date.dt.date
df = df.set_index('H_name').groupby(['H_name', 'Date_1']).agg({'ID':np.mean, 'Value':np.sum})
print(df.pivot_table(index='H_name', columns=['Date_1', 'ID'], values='Value'))
打印:
Date_1 2019-10-01 2019-10-02
ID 24.5 22.5
H_name
Hospital1 27 29
Hospital2 27 29
Hospital3 35 39