我想从聚合函数创建一个数据框。我认为它会默认创建一个数据帧,因为这个解决方案说明了,但它创建了一个系列,我不知道为什么(Converting a Pandas GroupBy object to DataFrame)。
数据框来自Kaggle的旧金山薪水。我的代码:
df=pd.read_csv('Salaries.csv')
in: type(df)
out: pandas.core.frame.DataFrame
in: df.head()
out: EmployeeName JobTitle TotalPay TotalPayBenefits Year Status 2BasePay 2OvertimePay 2OtherPay 2Benefits 2Year
0 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.43 567595.43 2011 NaN 167411.18 0.00 400184.25 NaN 2011-01-01
1 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) 538909.28 538909.28 2011 NaN 155966.02 245131.88 137811.38 NaN 2011-01-01
2 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT) 335279.91 335279.91 2011 NaN 212739.13 106088.18 16452.60 NaN 2011-01-01
3 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.61 332343.61 2011 NaN 77916.00 56120.71 198306.90 NaN 2011-01-01
4 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.19 326373.19 2011 NaN 134401.60 9737.00 182234.59 NaN 2011-01-01
in: df2=df.groupby(['JobTitle'])['TotalPay'].mean()
type(df2)
out: pandas.core.series.Series
我希望df2是一个包含'JobTitle'和'TotalPlay'
列的数据框答案 0 :(得分:3)
分解你的代码:
df2 = df.groupby(['JobTitle'])['TotalPay'].mean()
groupby
没问题。它是失误的['TotalPay']
。这告诉groupby
仅对mean
中定义的每个组pd.Series
df['TotalPay']
执行['JobTitle']
函数。相反,您希望使用[['TotalPay']]
引用此列。注意双括号。这些双括号表示pd.DataFrame
。
df2 = df2=df.groupby(['JobTitle'])[['TotalPay']].mean()