多索引熊猫数据透视表中列和索引的值计算

时间:2019-12-21 21:06:55

标签: python pandas dataframe pivot multi-index

我的多索引熊猫枢轴df看起来像这样:

 Date         2019-10-01 11:00     2019-10-01  12:00     2019-10-01  13:00      ...     2019-10-29 17:00 
   ID              25                     24                    25              ...              24
H_name                                                         
Hospital1          12                     15                    16              ...              12                                                              
Hospital2          10                     17                    14              ...              12 
Hospital3          15                     20                    12              ...              12 

我想得到:

   Date         2019-10-01               2019-10-02           2019-10-03     
   ID             25.45                   24.33                 23.71             
H_name                                                         
Hospital1          253                     287                   261                                                                         
Hospital2          212                     232                   264            
Hospital3          221                     219                   223

“ H_name”的值是一天中所有时段的总和,“ ID”是一天中所有时段的平均值。谢谢您的帮助=)

我在枢纽之前的df

        H_name            Date              ID      Value  
0     Hospital1     2019-10-01  11:00       25        12
1     Hospital2     2019-10-01  11:00       25        10
2     Hospital3     2019-10-01  11:00       25        15
3     Hospital1     2019-10-01  12:00       24        15
4     Hospital2     2019-10-01  12:00       24        17
5     Hospital3     2019-10-01  12:00       24        20
        ....              ....              ...       ...
680   Hospital1     2019-10-30  15:00       20        11
681   Hospital2     2019-10-30  15:00       20        18
682   Hospital3     2019-10-30  15:00       20        17

1 个答案:

答案 0 :(得分:0)

如果我对您的理解正确,则希望按日期对数据进行分组(Valuenp.sum分组,IDnp.mean分组),然后再创建数据透视表:

import numpy as np
import pandas as pd

h_name = ['Hospital1', 'Hospital2', 'Hospital3', 'Hospital1', 'Hospital2', 'Hospital3',
          'Hospital1', 'Hospital2', 'Hospital3', 'Hospital1', 'Hospital2', 'Hospital3']

date = ['2019-10-01  11:00', '2019-10-01  11:00', '2019-10-01  11:00', '2019-10-01  12:00', '2019-10-01  12:00', '2019-10-01  12:00',
        '2019-10-02  11:00', '2019-10-02  11:00', '2019-10-02  11:00', '2019-10-02  12:00', '2019-10-02  12:00', '2019-10-02  12:00']

ids = [25, 25, 25, 24, 24, 24,
       23, 23, 23, 22, 22, 22]

value = [12, 10, 15, 15, 17, 20,
         15, 16, 17, 14, 13, 22]

df = pd.DataFrame({'H_name': h_name, 'Date': date, 'ID': ids, 'Value': value})
df['Date'] = pd.to_datetime(df['Date'], utc=False)
print(df)

df中的数据如下:

       H_name                Date  ID  Value
0   Hospital1 2019-10-01 11:00:00  25     12
1   Hospital2 2019-10-01 11:00:00  25     10
2   Hospital3 2019-10-01 11:00:00  25     15
3   Hospital1 2019-10-01 12:00:00  24     15
4   Hospital2 2019-10-01 12:00:00  24     17
5   Hospital3 2019-10-01 12:00:00  24     20
6   Hospital1 2019-10-02 11:00:00  23     15
7   Hospital2 2019-10-02 11:00:00  23     16
8   Hospital3 2019-10-02 11:00:00  23     17
9   Hospital1 2019-10-02 12:00:00  22     14
10  Hospital2 2019-10-02 12:00:00  22     13
11  Hospital3 2019-10-02 12:00:00  22     22

然后:

df['Date_1'] = df.Date.dt.date

df = df.set_index('H_name').groupby(['H_name', 'Date_1']).agg({'ID':np.mean, 'Value':np.sum})
print(df.pivot_table(index='H_name', columns=['Date_1', 'ID'], values='Value'))

打印:

Date_1    2019-10-01 2019-10-02
ID              24.5       22.5
H_name                         
Hospital1         27         29
Hospital2         27         29
Hospital3         35         39