假设我有以下数据框:
a b c Sce1 Sce2 Sce3 Sce4 Sce5 Sc6
Animal Ground Dog 0.0 0.9 0.5 0.0 0.3 0.4
Animal Ground Cat 0.6 0.5 0.3 0.5 1.0 0.2
Animal Air Eagle 1.0 0.1 0.1 0.6 0.9 0.1
Animal Air Owl 0.3 0.1 0.5 0.3 0.5 0.9
Object Metal Car 0.3 0.3 0.8 0.6 0.5 0.6
Object Metal Bike 0.5 0.1 0.4 0.7 0.4 0.2
Object Wood Chair 0.9 0.6 0.1 0.9 0.2 0.8
Object Wood Table 0.9 0.6 0.6 0.1 0.9 0.7
我想创建一个MultiIndex,它将包含每个lvl的总和。输出将如下所示:
a b c Sce1 Sce2 Sce3 Sce4 Sce5 Sce6
Animal 1.9 1.6 1.4 1.3 2.7 1.6
Ground 0.6 1.4 0.8 0.5 1.3 0.6
Dog 0.0 0.9 0.5 0.0 0.3 0.4
Cat 0.6 0.5 0.3 0.5 1.0 0.2
Air 1.3 0.2 0.7 0.8 1.4 1.0
Eagle 1.0 0.1 0.1 0.6 0.9 0.1
Owl 0.3 0.1 0.5 0.3 0.5 0.9
Object 2.6 1.6 1.8 2.3 2.0 2.3
Metal 0.8 0.3 1.1 1.3 0.9 0.8
Car 0.3 0.3 0.8 0.6 0.5 0.6
Bike 0.5 0.1 0.4 0.7 0.4 0.2
Wood 1.8 1.3 0.6 1.0 1.1 1.5
Chair 0.9 0.6 0.1 0.9 0.2 0.8
Table 0.9 0.6 0.6 0.1 0.9 0.7
目前我正在使用循环在每个级别创建三个不同的数据框,然后在excel上操作它们,如下所示。所以如果可能的话我想在python中进行这个计算。
for i in range range(0,3):
df = df.groupby(list(df.columns)[0:lvl], as_index=False).sum()
return df
非常感谢提前。
答案 0 :(得分:11)
自由使用MAGIC
pd.concat([
df.assign(
**{x: 'Total' for x in 'abc'[i:]}
).groupby(list('abc')).sum() for i in range(4)
]).sort_index()
Sce1 Sce2 Sce3 Sce4 Sce5 Sc6
a b c
Animal Air Eagle 1.0 0.1 0.1 0.6 0.9 0.1
Owl 0.3 0.1 0.5 0.3 0.5 0.9
Total 1.3 0.2 0.6 0.9 1.4 1.0
Ground Cat 0.6 0.5 0.3 0.5 1.0 0.2
Dog 0.0 0.9 0.5 0.0 0.3 0.4
Total 0.6 1.4 0.8 0.5 1.3 0.6
Total Total 1.9 1.6 1.4 1.4 2.7 1.6
Object Metal Bike 0.5 0.1 0.4 0.7 0.4 0.2
Car 0.3 0.3 0.8 0.6 0.5 0.6
Total 0.8 0.4 1.2 1.3 0.9 0.8
Total Total 2.6 1.6 1.9 2.3 2.0 2.3
Wood Chair 0.9 0.6 0.1 0.9 0.2 0.8
Table 0.9 0.6 0.6 0.1 0.9 0.7
Total 1.8 1.2 0.7 1.0 1.1 1.5
Total Total Total 4.5 3.2 3.3 3.7 4.7 3.9
我可以准确地得到你所要求的
pd.concat([
df.assign(
**{x: '' for x in 'abc'[i:]}
).groupby(list('abc')).sum() for i in range(1, 4)
]).sort_index()
Sce1 Sce2 Sce3 Sce4 Sce5 Sc6
a b c
Animal 1.9 1.6 1.4 1.4 2.7 1.6
Air 1.3 0.2 0.6 0.9 1.4 1.0
Eagle 1.0 0.1 0.1 0.6 0.9 0.1
Owl 0.3 0.1 0.5 0.3 0.5 0.9
Ground 0.6 1.4 0.8 0.5 1.3 0.6
Cat 0.6 0.5 0.3 0.5 1.0 0.2
Dog 0.0 0.9 0.5 0.0 0.3 0.4
Object 2.6 1.6 1.9 2.3 2.0 2.3
Metal 0.8 0.4 1.2 1.3 0.9 0.8
Bike 0.5 0.1 0.4 0.7 0.4 0.2
Car 0.3 0.3 0.8 0.6 0.5 0.6
Wood 1.8 1.2 0.7 1.0 1.1 1.5
Chair 0.9 0.6 0.1 0.9 0.2 0.8
Table 0.9 0.6 0.6 0.1 0.9 0.7
至于如何!我将此作为练习留给读者。
答案 1 :(得分:0)
你需要做两个LIBNAME TempSrc "C:\Temp";
proc import datafile="\\***\FileLocation\file.csv"
out=mydata dbms=dlm replace;
DELIMITER= ",";
getnames=yes;
options ExtendObsCounter=yes;
RUN;
DATA TempSrc.fileName;
attrib DATEVAR length=$11 format=$11. informat=$11. label='Date'
;
set work.mydata;
RUN;
来获得每个聚合级别的小计。然后将这些添加回初始DF。这是一个related question。