我正在执行函数,我按ID分组并使用以下代码汇总与这些ID相关联的$值:
df = df.groupby([' Id'], as_index=False, sort=False)[["Amount"]].sum();
但它没有重命名列。因此我尝试这样做:
`df = df.groupby([' Id'], as_index=False, sort=False)`[["Amount"]].sum();.reset_index(name ='Total Amount')
但它给了我错误,TypeError:reset_index()得到了一个意外的关键字参数'name'
所以我最后在这篇文章后尝试这样做:Python Pandas Create New Column with Groupby().Sum()
df = df.groupby(['Id'])[["Amount"]].transform('sum');
但它仍然没有用。
我做错了什么?
答案 0 :(得分:7)
我认为您需要删除参数as_index=False
并使用Series.reset_index
,因为此参数返回df
,然后DataFrame.reset_index
参数name
失败:
df = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')
或rename
列首先:
d = {'Amount':'Total Amount'}
df = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()
样品:
df = pd.DataFrame({'Id':[1,2,2],'Amount':[10, 30,50]})
print (df)
Amount Id
0 10 1
1 30 2
2 50 2
df1 = df.groupby('Id', sort=False)["Amount"].sum().reset_index(name ='Total Amount')
print (df1)
Id Total Amount
0 1 10
1 2 80
d = {'Amount':'Total Amount'}
df1 = df.rename(columns=d).groupby('Id', sort=False, as_index=False)["Total Amount"].sum()
print (df1)
Id Total Amount
0 1 10
1 2 80
但是,如果需要原始sum
中包含df
的新列,请使用transform
并将输出分配给新列:
df['Total Amount'] = df.groupby('Id', sort=False)["Amount"].transform('sum')
print (df)
Amount Id Total Amount
0 10 1 10
1 30 2 80
2 50 2 80
答案 1 :(得分:0)
import pandas as pd
# set up dataframe
df = pd.DataFrame({'colA':['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd'],
'colB':['cat', 'cat', 'dog', 'cat', 'dog', 'cat', 'cat', 'dog'],
'colC':[1,2,3,4,4,5,6,7], })
print(df)
colA colB colC
0 a cat 1
1 a cat 2
2 a dog 3
3 b cat 4
4 b dog 4
5 c cat 5
6 c cat 6
7 d dog 7
# group on vals in column A
# get min (within groups) for column B
# get avg (within groups) for column C
df_agg = ( df.groupby(by=['colA'])
.agg({'colB':'min', 'colC':'mean'})
.rename(columns={'colB':'colB_grp_min', 'colC':'colC_grp_avg'})
)
print(df_agg)
min_colB avg_colC
colA
a cat 2.0
b cat 4.0
c cat 5.5
d dog 7.0
# if you want multiple aggregations on the same column, pass a list
# this will return a multiindex
# group on vals in column A
# get min (within groups) for column B
# get avg and max (within groups) for column C
df_agg2 = ( df.groupby(by=['colA'])
.agg({'colB':'min', 'colC':['mean', 'max']})
.rename(columns={'colB':'colB_grp_min', 'colC':'colC_grp_multi_index'})
)
print(df_agg2)
colB_grp_min colC_grp_multi_index
min mean max
colA
a cat 2.0 3
b cat 4.0 4
c cat 5.5 6
d dog 7.0 7