我有一个带有两个索引的DataFrame;它看起来像这样:
>>> by_hour
pr da delta delta_sq
node timestamputc
A 1 20.540423 21.093659 0.553237 9.869976
B 1 17.675580 18.183104 0.507524 11.474762
C 1 16.257307 16.961944 0.704638 68.023460
... ... ... ... ...
X 24 20.649155 20.805145 0.155990 43.176084
Y 24 20.677271 21.183925 0.506655 47.746125
Z 24 21.455556 21.725556 0.270000 39.393092
[60312 rows x 4 columns]
我有另一个带有单个索引的DataFrame,与by_hour
的0级索引相同:
>>> nodes
type
node
A type 1
B type 1
C type 2
... ...
X type 3
Y type 1
Z type 2
[2513 rows x 1 columns]
我想通过"类型"对第一个DataFrame进行分组。第二个DataFrame的列,同时保留1级索引,以获得如下输出:
pr da delta delta_sq
type timestamputc
type 1 1
2
...
type 2 1
2
...
type n 1
...
24
我该怎么做?是否可以不创建中间数据框?
答案 0 :(得分:4)
这会按类型和时间戳聚合DataFrame:
node_type = nodes.loc[by_hour.index.get_level_values('node'), 'type'].values
timestamp = by_hour.index.get_level_values('timestamputc')
by_hour.groupby([node_type, timestamp]).sum()
pr da delta delta_sq
timestamputc
type 1 1 38.216003 39.276763 1.060761 21.344738
24 20.677271 21.183925 0.506655 47.746125
type 2 1 16.257307 16.961944 0.704638 68.023460
24 21.455556 21.725556 0.270000 39.393092
type 3 24 20.649155 20.805145 0.155990 43.176084