我正尝试使用MinMaxScaler()
将数据缩放到0-1之间,方法是:
x_scaling = x_scale.transform(x)
print("Min:", np.min(x_scaling))
print("Max:", np.max(x_scaling))
我的引用错误消息是:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-75-c862a09c2cc2> in <module>()
--> 120 x_scaling = x_scale.transform(x)
121
122 print("Min:", np.min(x_scaling))
~/anaconda3_501/lib/python3.6/site-packages/sklearn/preprocessing/data.py in transform(self, X)
365 check_is_fitted(self, 'scale_')
366
--> 367 X = check_array(X, copy=self.copy, dtype=FLOAT_DTYPES)
368
369 X *= self.scale_
~/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
451 % (array.ndim, estimator_name))
452 if force_all_finite:
--> 453 _assert_all_finite(array)
454
455 shape_repr = _shape_repr(array.shape)
~/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
42 and not np.isfinite(X).all()):
43 raise ValueError("Input contains NaN, infinity"
---> 44 " or a value too large for %r." % X.dtype)
45
46
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
我的数据确实有NaN,因为我将数据上移了1,我的DataFrame看起来像:
6 2012-01-01 07:00:00 0.022311 1.677769 6 2.963995
7 2012-01-01 08:00:00 0.014925 2.963995 7 5.062572
8 2012-01-01 09:00:00 0.096465 5.062572 8 7.065042
9 2012-01-01 10:00:00 0.284445 7.065042 9 **NaN**
如果由于错误消息指出了各种可能性而导致了这个问题,请问我能为您解决这个问题提供帮助。
答案 0 :(得分:1)
您要使用numpy.nanmin()
和numpy.nanmax()
:
返回数组的最小值或沿轴的最小值,而忽略所有NaN。当遇到所有NaN片时,将引发RuntimeWarning并为该片返回Nan。
例如而不是MinMaxScaler()
,而是创建一个忽略NaN的自定义缩放器,如下所示:
x_std = (x - np.nanmin(x))/(np.nanmax(x) - np.nanmin(x))
x_scaled = x_std * (max - min) + min
其中最小,最大= feature_range。