Pandas Multiindex DataFrame - 按外部系列分组?

时间:2017-07-12 15:44:17

标签: python pandas numpy

我有一个带有两个索引的DataFrame;它看起来像这样:

>>> by_hour
                                  pr         da     delta   delta_sq
node         timestamputc                                           
A            1             20.540423  21.093659  0.553237   9.869976
B            1             17.675580  18.183104  0.507524  11.474762
C            1             16.257307  16.961944  0.704638  68.023460
...                              ...        ...       ...        ...
X            24            20.649155  20.805145  0.155990  43.176084
Y            24            20.677271  21.183925  0.506655  47.746125
Z            24            21.455556  21.725556  0.270000  39.393092

[60312 rows x 4 columns] 

我有另一个带有单个索引的DataFrame,与by_hour的0级索引相同:

>>> nodes
               type
node                 
A                type 1
B                type 1
C                type 2
...                 ...
X                type 3
Y                type 1
Z                type 2

[2513 rows x 1 columns]

我想通过"类型"对第一个DataFrame进行分组。第二个DataFrame的列,同时保留1级索引,以获得如下输出:

                            pr        da        delta      delta_sq 
type     timestamputc
type 1   1
         2
         ...
type 2   1
         2
...
type n   1
         ...
         24

我该怎么做?是否可以不创建中间数据框?

1 个答案:

答案 0 :(得分:4)

这会按类型和时间戳聚合DataFrame:

node_type = nodes.loc[by_hour.index.get_level_values('node'), 'type'].values
timestamp = by_hour.index.get_level_values('timestamputc')
by_hour.groupby([node_type, timestamp]).sum()


                            pr         da     delta   delta_sq
       timestamputc                                           
type 1 1             38.216003  39.276763  1.060761  21.344738
       24            20.677271  21.183925  0.506655  47.746125
type 2 1             16.257307  16.961944  0.704638  68.023460
       24            21.455556  21.725556  0.270000  39.393092
type 3 24            20.649155  20.805145  0.155990  43.176084