我正在处理数据集,但它缺少一些值。我正在努力填补这些价值观。 这是我的代码的一部分。
table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)
def fage(x):
return table.loc[x['Self_Employed'],x['Education']]
df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)
这显示了以下错误。
KeyError: (nan, 'occurred at index 95')
尽管有nan
在LoanAmount
列的索引95处
,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status,TotalIncome,TotalIncome_log
89,LP001310,1,1,0,0,No,5695,4167.0,175.0,360.0,1.0,Semiurban,Y,9862.0,9.196444266784072
90,LP001316,1,1,0,0,No,2958,2900.0,131.0,360.0,1.0,Semiurban,Y,5858.0,8.675563527387679
91,LP001318,1,1,2,0,No,6250,5654.0,188.0,180.0,1.0,Semiurban,Y,11904.0,9.384629757072872
92,LP001319,1,1,2,1,No,3273,1820.0,81.0,360.0,1.0,Urban,Y,5093.0,8.535622326884605
93,LP001322,1,0,0,0,No,4133,0.0,122.0,360.0,1.0,Semiurban,Y,4133.0,8.326758814511733
94,LP001325,1,0,0,1,No,3620,0.0,25.0,120.0,1.0,Semiurban,Y,3620.0,8.194229304819817
95,LP001326,1,0,0,0,,6782,0.0,,360.0,1.0,Urban,N,6782.0,8.822027322685583
96,LP001327,0,1,0,0,No,2484,2302.0,137.0,360.0,1.0,Semiurban,Y,4786.0,8.47345026846832
97,LP001333,1,1,0,0,No,1977,997.0,50.0,360.0,1.0,Semiurban,Y,2974.0,7.9976631270201
98,LP001334,1,1,0,1,No,4188,0.0,115.0,180.0,1.0,Semiurban,Y,4188.0,8.339978571990427
99,LP001343,1,1,0,0,No,1759,3541.0,131.0,360.0,1.0,Semiurban,Y,5300.0,8.575462099540212
100,LP001345,1,1,2,1,No,4288,3263.0,133.0,180.0,1.0,Urban,Y,7551.0,8.929435283803425
101,LP001349,1,0,0,0,No,4843,3806.0,151.0,360.0,1.0,Semiurban,Y,8649.0,9.065198986306513
102,LP001350,1,1,0,0,No,13650,0.0,,360.0,1.0,Urban,Y,13650.0,9.521494800613105
103,LP001356,1,1,0,0,No,4652,3583.0,,360.0,1.0,Semiurban,Y,8235.0,9.016148642611741
104,LP001357,1,1,0,0,No,3816,754.0,160.0,360.0,1.0,Urban,Y,4570.0,8.42726848388825
105,LP001367,1,1,1,0,No,3052,1030.0,100.0,360.0,1.0,Urban,Y,4082.0,8.31434234336979
106,LP001369,1,1,2,0,No,11417,1126.0,225.0,360.0,1.0,Urban,Y,12543.0,9.436918020024674
107,LP001370,1,0,0,1,,7333,0.0,120.0,360.0,1.0,Rural,N,7333.0,8.9001399880938
108,LP001379,1,1,2,0,No,3800,3600.0,216.0,360.0,0.0,Urban,N,7400.0,8.909235279192261
109,LP001384,1,1,3,1,No,2071,754.0,94.0,480.0,1.0,Semiurban,Y,2825.0,7.946263643580541
110,LP001385,1,0,0,0,No,5316,0.0,136.0,360.0,1.0,Urban,Y,5316.0,8.578476419833136
答案 0 :(得分:0)
table
是您获得nan
值时的问题。
table
仅包含Education
和0
的{{1}}值和1
或Self_Employed
的{{1}}值,而不包含{ {1}},因为未考虑Yes
个条件。No
nan
== nan
,在Loan_ID LP001326
Gender 1
Married 0
Dependents 0
Education 0
Self_Employed NaN
ApplicantIncome 6782
CoapplicantIncome 0
LoanAmount NaN
Loan_Amount_Term 360
Credit_History 1
Property_Area Urban
Loan_Status N
TotalIncome 6782
TotalIncome_log 8.82203
Name: 95, dtype: object
中找不到它,所以您得到了Self_Employed
nan
问题,请考虑删除table
或KeyError
为KeyError
的行,或在创建{{之前在这两列中替换Self_Employed
1}}。Education
nan
的所有条件
nan
和table
的更多df.Self_Employed.fillna('Unknown', inplace=True)
df.Education.fillna('Unknown', inplace=True)
table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)
table
值table