Question

我正在练习贷款预测实践问题并尝试填写数据中的缺失值。我从here获得了数据。要完成此问题，我将关注此tutorial。

您可以找到我正在使用的整个代码（文件名model.py）以及GitHub上的数据。

DataFrame看起来像这样：

执行最后一行后（对应于model.py文件中的第122行）

/home/user/.local/lib/python2.7/site-packages/numpy/lib/arraysetops.py:216: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
  flag = np.concatenate(([True], aux[1:] != aux[:-1]))
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
  File "model.py", line 123, in <module>
    classification_model(model, df,predictor_var,outcome_var)
  File "model.py", line 89, in classification_model
    model.fit(data[predictors],data[outcome])
  File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1173, in fit
    order="C")
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 521, in check_X_y
    ensure_min_features, warn_on_dtype, estimator)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 407, in check_array
    _assert_all_finite(array)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

由于缺少值，我收到此错误。如何填写这些缺失值？

Self_Employed 和 LoanAmount 的缺失值如何填写其余部分。感谢您的帮助。

Answer 1

您可以使用fillna：

df['Gender'].fillna('no data',inplace=True)
df['Married'].fillna('no data',inplace=True)

或者如果需要将多个列替换为相同的值：

cols = ['Gender','Married']
df[cols] = df[cols].fillna('no data')

如果需要替换多列，可以使用dict列名和替换值

df = pd.DataFrame({'Gender':['m','f',np.nan], 
                   'Married':[np.nan,'yes','no'],
                   'credit history':[1.,np.nan,0]})
print (df)
  Gender Married  credit history
0      m     NaN             1.0
1      f     yes             NaN
2    NaN      no             0.0

d = {'Gender':'no data', 'Married':'no data', 'credit history':0}
df = df.fillna(d)
print (df)
    Gender  Married  credit history
0        m  no data             1.0
1        f      yes             0.0
2  no data       no             0.0

ValueError：输入包含NaN，无穷大或对于dtype来说太大的值（＆＃39; float64＆＃39;）

1 个答案: