我有一个grouped
类型的DataFrameGroupBy
对象。我想用它来汇总一些数据,如下所示:
aggregated = grouped.aggregate([np.sum, np.mean], axis=1)
这将返回DataFrame
,其格式为:
aggregated[:3].to_dict()
"""
{('VALUE1', 'sum'): {
('US10adam034', 'PRCP'): 701,
('US10adam036', 'PRCP'): 1015,
('US10adam036', 'SNOW'): 46},
('VALUE1', 'mean'): {
('US10adam034', 'PRCP'): 100.14285714285714,
('US10adam036', 'PRCP'): 145.0,
('US10adam036', 'SNOW'): 46.0}}
"""
打印出头部会产生以下结果:
VALUE1
sum mean
ID ELEMENT
US10adam034 PRCP 701 100.142857
US10adam036 PRCP 1015 145.000000
SNOW 46 46.000000
US10adam046 PRCP 790 131.666667
US10adam051 PRCP 5 0.555556
US10adam056 PRCP 540 31.764706
SNOW 25 1.923077
SNWD 165 15.000000
这很好。当我的样本的分组索引为(ID, ELEMENT)
时,它可以轻松计算出我的样本的 sum 和 means 。但是,我真的很想把它变成单行格式,其中 ID是唯一的,而列是ELEMENT
和{{1}的组合}。我可以像这样使用(sum|mean)
几乎到达那里:
apply
我再次打印出头部:
def getNewSeries(t):
# type(t) => Series
element = t.name[1] # t.name is a tuple ('ID', 'ELEMENT')
sum_index=f'{element}sum'
mean_index=f'{element}mean'
return pd.Series(t['VALUE1'].values, index=[sum_index, mean_index])
aggregated.apply(getNewSeries, axis=1, result_type='expand')
我希望我的最终DataFrame看起来像这样:
PRCPmean PRCPsum SNOWmean SNOWsum SNWDmean ...
ID ELEMENT
US10adam034 PRCP 100.142857 701.0 NaN NaN NaN
US10adam036 PRCP 145.000000 1015.0 NaN NaN NaN
SNOW NaN NaN 46.000000 46.0 NaN
US10adam046 PRCP 131.666667 790.0 NaN NaN NaN
US10adam051 PRCP 0.555556 5.0 NaN NaN NaN
US10adam056 PRCP 31.764706 540.0 NaN NaN NaN
SNOW NaN NaN 1.923077 25.0 NaN
SNWD NaN NaN NaN NaN 15.0
有没有一种方法可以使用 PRCPmean PRCPsum SNOWmean SNOWsum SNWDmean ...
ID
US10adam034 100.142857 701.0 NaN NaN NaN
US10adam036 145.000000 1015.0 46.000000 46.0 NaN
US10adam046 131.666667 790.0 NaN NaN NaN
US10adam051 0.555556 5.0 NaN NaN NaN
US10adam056 31.764706 540.0 1.923077 25.0 15.0
,apply
或agg
将此数据聚合为单行?我也尝试过在唯一ID上创建自己的迭代器,但速度很慢。我喜欢使用transform
来计算总和/平均值的简便性。
答案 0 :(得分:2)
我喜欢将f-string用于列表理解。.f-string格式化需要Python 3.6 +。
df_out = df.unstack()['VALUE1']
df_out.columns = [f'{i}{j}' for i, j in df_out.columns]
df_out
输出:
PRCPsum SNOWsum PRCPmean SNOWmean
US10adam034 701.0 NaN 100.142857 NaN
US10adam036 1015.0 46.0 145.000000 46.0
答案 1 :(得分:1)
您可以这样做:
new_df = agg_df.unstack(level=1)
new_df.columns = [c+b for _,b,c in new_df.columns.values]
输出:
PRCPsum SNOWsum PRCPmean SNOWmean
US10adam034 701.0 NaN 100.142857 NaN
US10adam036 1015.0 46.0 145.000000 46.0
答案 2 :(得分:1)
IIUC
aggregated = grouped['VALUE1'].aggregate([np.sum, np.mean], axis=1)
aggregated=aggregated.unstack()
aggregated.columns=aggregated.columns.map('{0[1]}|{0[0]}'.format)
答案 3 :(得分:0)
请根据您的需要检查reset_index是否工作
import { HostListener } from '@angular/core';
@HostListener("window:scroll", ["$event"])
onWindowScroll() {
let scrollTop = (document.documentElement.scrollTop || document.body.scrollTop);
let clientHeight = document.documentElement.clientHeight;
let scrollHeight = document.documentElement.scrollHeight;
if ((scrollTop + clientHeight) >= scrollHeight) {
//FUNCTIONS WHEN BOTTOM
}
else if ((scrollTop + scrollHeight) == scrollHeight) {
//FUNCTIONS WHEN TOP
}
}
答案 4 :(得分:0)
我认为您可以尝试使用unstack()将最里面的行索引移动为最里面的列索引,以重塑数据。
您还可以使用fill_value将NaN更改为0