在KeyError: 'BasePay'
元素中获得BasePay
的数据帧时,却在使用mean()
函数时丢失了。
我的熊猫版本是'0.23.3'
python3.6.3
>>> import numpy as np
>>> salDataF = pd.read_csv('Salaries.csv', low_memory=False)
>>> salDataF.head()
Id EmployeeName JobTitle BasePay OvertimePay OtherPay ... TotalPay TotalPayBenefits Year Notes Agency Status
0 1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 167411.18 0.0 400184.25 ... 567595.43 567595.43 2011 NaN San Francisco NaN
1 2 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) 155966.02 245131.88 137811.38 ... 538909.28 538909.28 2011 NaN San Francisco NaN
2 3 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT) 212739.13 106088.18 16452.6 ... 335279.91 335279.91 2011 NaN San Francisco NaN
3 4 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC 77916.0 56120.71 198306.9 ... 332343.61 332343.61 2011 NaN San Francisco NaN
4 5 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 134401.6 9737.0 182234.59 ... 326373.19 326373.19 2011 NaN San Francisco NaN
[5 rows x 13 columns]
>>> EmpSal = salDataF.groupby('Year').mean()
KeyboardInterrupt
>>> salDataF.groupby('Year').mean()
Id TotalPay TotalPayBenefits Notes
Year
2011 18080.0 71744.103871 71744.103871 NaN
2012 54542.5 74113.262265 100553.229232 NaN
2013 91728.5 77611.443142 101440.519714 NaN
2014 129593.0 75463.918140 100250.918884 NaN
>>> EmpSal = salDataF.groupby('Year').mean()['BasePay']
错误:KeyError:'BasePay'
答案 0 :(得分:0)
这是问题BasePay
不是数字,所以salDataF.groupby('Year').mean()
exclude all non numeric columns是设计使然。
解决方案是首先尝试astype
:
salDataF['BasePay'] = salDataF['BasePay'].astype(float)
...并且如果由于某些非数字数据将to_numeric
与errors='coerce'
一起使用而将其转换为NaN
s而失败了
salDataF['BasePay'] = pd.to_numeric(salDataF['BasePay'], errors='coerce')
然后最好选择mean
之前的列:
EmpSal = salDataF.groupby('Year')['BasePay'].mean()