我通过对分组数据帧运行series.describe()获得了pandas中的多索引。如何按modelName.mean
排序这些系列并仅保留特定字段?
此
summary.sortlevel(1)['kappa']
对它们进行排序,但保留所有其他字段,如count。我怎样才能保留mean
和std
?
这是df的文字表示。
kappa
modelName
biasTotal count 5.000000
mean 0.526183
std 0.013429
min 0.507536
25% 0.519706
50% 0.525565
75% 0.538931
max 0.539175
biasTotalWithDistanceMetricAccount count 5.000000
mean 0.527275
std 0.014218
min 0.506428
25% 0.520438
50% 0.529771
75% 0.538475
max 0.541262
lightGBMbiasTotal count 5.000000
mean 0.531639
std 0.013819
min 0.513363
答案 0 :(得分:1)
你可以这样做:
数据:
In [77]: df
Out[77]:
0
level_1 level_0
a 25% 2.000000
50% 4.000000
75% 7.000000
count 5.000000
max 7.000000
mean 4.400000
min 2.000000
std 2.509980
b 25% 2.000000
50% 6.000000
75% 8.000000
count 5.000000
max 8.000000
mean 5.000000
min 1.000000
std 3.316625
c 25% 3.000000
50% 4.000000
75% 5.000000
count 5.000000
max 8.000000
mean 4.000000
min 0.000000
std 2.915476
d 25% 4.000000
50% 8.000000
75% 8.000000
count 5.000000
max 9.000000
mean 6.000000
min 1.000000
std 3.391165
解决方案:
In [78]: df.loc[pd.IndexSlice[:, ['mean','std']], :]
Out[78]:
0
level_1 level_0
a mean 4.400000
std 2.509980
b mean 5.000000
std 3.316625
c mean 4.000000
std 2.915476
d mean 6.000000
std 3.391165
设定:
df = (pd.DataFrame(np.random.randint(0,10,(5,4)),columns=list('abcd'))
.describe()
.stack()
.reset_index()
.set_index(['level_1','level_0'])
.sort_index()
)