Question

我正在处理数据集，但它缺少一些值。我正在努力填补这些价值观。这是我的代码的一部分。

table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median)

def fage(x):
    return table.loc[x['Self_Employed'],x['Education']]

df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)

这显示了以下错误。

KeyError: (nan, 'occurred at index 95')

这是数据集的快照。

尽管有nan

，但我不知道为什么会发生此错误

在LoanAmount列的索引95处

从剪贴板复制的数据集

,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status,TotalIncome,TotalIncome_log
89,LP001310,1,1,0,0,No,5695,4167.0,175.0,360.0,1.0,Semiurban,Y,9862.0,9.196444266784072
90,LP001316,1,1,0,0,No,2958,2900.0,131.0,360.0,1.0,Semiurban,Y,5858.0,8.675563527387679
91,LP001318,1,1,2,0,No,6250,5654.0,188.0,180.0,1.0,Semiurban,Y,11904.0,9.384629757072872
92,LP001319,1,1,2,1,No,3273,1820.0,81.0,360.0,1.0,Urban,Y,5093.0,8.535622326884605
93,LP001322,1,0,0,0,No,4133,0.0,122.0,360.0,1.0,Semiurban,Y,4133.0,8.326758814511733
94,LP001325,1,0,0,1,No,3620,0.0,25.0,120.0,1.0,Semiurban,Y,3620.0,8.194229304819817
95,LP001326,1,0,0,0,,6782,0.0,,360.0,1.0,Urban,N,6782.0,8.822027322685583
96,LP001327,0,1,0,0,No,2484,2302.0,137.0,360.0,1.0,Semiurban,Y,4786.0,8.47345026846832
97,LP001333,1,1,0,0,No,1977,997.0,50.0,360.0,1.0,Semiurban,Y,2974.0,7.9976631270201
98,LP001334,1,1,0,1,No,4188,0.0,115.0,180.0,1.0,Semiurban,Y,4188.0,8.339978571990427
99,LP001343,1,1,0,0,No,1759,3541.0,131.0,360.0,1.0,Semiurban,Y,5300.0,8.575462099540212
100,LP001345,1,1,2,1,No,4288,3263.0,133.0,180.0,1.0,Urban,Y,7551.0,8.929435283803425
101,LP001349,1,0,0,0,No,4843,3806.0,151.0,360.0,1.0,Semiurban,Y,8649.0,9.065198986306513
102,LP001350,1,1,0,0,No,13650,0.0,,360.0,1.0,Urban,Y,13650.0,9.521494800613105
103,LP001356,1,1,0,0,No,4652,3583.0,,360.0,1.0,Semiurban,Y,8235.0,9.016148642611741
104,LP001357,1,1,0,0,No,3816,754.0,160.0,360.0,1.0,Urban,Y,4570.0,8.42726848388825
105,LP001367,1,1,1,0,No,3052,1030.0,100.0,360.0,1.0,Urban,Y,4082.0,8.31434234336979
106,LP001369,1,1,2,0,No,11417,1126.0,225.0,360.0,1.0,Urban,Y,12543.0,9.436918020024674
107,LP001370,1,0,0,1,,7333,0.0,120.0,360.0,1.0,Rural,N,7333.0,8.9001399880938
108,LP001379,1,1,2,0,No,3800,3600.0,216.0,360.0,0.0,Urban,N,7400.0,8.909235279192261
109,LP001384,1,1,3,1,No,2071,754.0,94.0,480.0,1.0,Semiurban,Y,2825.0,7.946263643580541
110,LP001385,1,0,0,0,No,5316,0.0,136.0,360.0,1.0,Urban,Y,5316.0,8.578476419833136

Answer 1

table是您获得nan值时的问题。
- table仅包含Education和0的{{1}}值和1或Self_Employed的{{1}}值，而不包含{ {1}}，因为未考虑Yes个条件。

第95行如下

No

由于nan == nan，在Loan_ID LP001326 Gender 1 Married 0 Dependents 0 Education 0 Self_Employed NaN ApplicantIncome 6782 CoapplicantIncome 0 LoanAmount NaN Loan_Amount_Term 360 Credit_History 1 Property_Area Urban Loan_Status N TotalIncome 6782 TotalIncome_log 8.82203 Name: 95, dtype: object中找不到它，所以您得到了Self_Employed
要解决nan问题，请考虑删除table或KeyError为KeyError的行，或在创建{{之前在这两列中替换Self_Employed 1}}。

Education

数据

我使用此修改后的数据填充了nan的所有条件
- 此数据仅具有nan和table的更多df.Self_Employed.fillna('Unknown', inplace=True) df.Education.fillna('Unknown', inplace=True) table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median) table值

table

KeyError：（nan，“出现在索引95”）

从剪贴板复制的数据集

1 个答案:

数据