插补后拟合模型时出现ValueError

时间:2019-09-23 04:56:46

标签: python scikit-learn imputation

我正在使用Kaggle的Melbourne Housing Dataset来拟合回归模型,其中Price是目标值。您可以找到数据集here

import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble.partial_dependence import partial_dependence, plot_partial_dependence
from sklearn.preprocessing import Imputer

cols_to_use = ['Distance', 'Landsize', 'BuildingArea']
data = pd.read_csv('data/melb_house_pricing.csv')
# drop rows where target is NaN
data = data.loc[~(data['Price'].isna())]
y = data.Price
X = data[cols_to_use]
my_imputer = Imputer()
imputed_X = my_imputer.fit_transform(X)

print(f"Contains NaNs in training data: {np.isnan(imputed_X).sum()}")
print(f"Contains NaNs in target data: {np.isnan(y).sum()}")
print(f"Contains Infinity: {np.isinf(imputed_X).sum()}")
print(f"Contains Infinity: {np.isinf(y).sum()}")

my_model = GradientBoostingRegressor()
my_model.fit(imputed_X, y)

# Here we make the plot
my_plots = plot_partial_dependence(my_model,       
                                   features=[0, 2], # column numbers of plots we want to show
                                   X=X,            # raw predictors data.
                                   feature_names=['Distance', 'Landsize', 'BuildingArea'], # labels on graphs
                                   grid_resolution=10) # number of values to plot on x axis

即使使用了sklearn的Imputer,我也收到以下错误-

Contains NaNs in training data: 0
Contains NaNs in target data: 0
Contains Infinity: 0
Contains Infinity: 0
/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/deprecation.py:85: DeprecationWarning: Function plot_partial_dependence is deprecated; The function ensemble.plot_partial_dependence has been deprecated in favour of sklearn.inspection.plot_partial_dependence in  0.21 and will be removed in 0.23.
  warnings.warn(msg, category=DeprecationWarning)
Traceback (most recent call last):
  File "partial_dependency_plots.py", line 29, in <module>
    grid_resolution=10) # number of values to plot on x axis
  File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/deprecation.py", line 86, in wrapped
    return fun(*args, **kwargs)
  File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/ensemble/partial_dependence.py", line 286, in plot_partial_dependence
    X = check_array(X, dtype=DTYPE, order='C')
  File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 542, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/Users/adimyth/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 56, in _assert_all_finite
    raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

作为,您可以看到在imputed_X中打印NaN的数量时,我得到0。所以,为什么我仍然得到ValueError。有帮助吗?

1 个答案:

答案 0 :(得分:0)

只需更改plot_partial_dependence的代码:

(define (even-odd-filter intlist)   
  (cond ((equal? '() intlist) '())
        ((equal? '() (cdr intlist)) intlist)
        ((equal? (modulo (car intlist) 2) (modulo (cadr intlist) 2))
         (cons (cadr intlist) (even-odd-filter (cons (car intlist) (cddr intlist)))))
        (else (even-odd-filter (cons (car intlist) (cddr intlist))))))

它将起作用。