我希望缩放数据框的一列,使其值介于0和1之间。为此,我使用了MinMaxScaler
,它工作正常,但却向我发送了混合消息。我正在做:
x = df['Activity'].values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df['Activity'] = pd.Series(x_scaled)
此代码的消息号码是一个警告:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
好的,所以具有1d阵列的四胞胎将是不久的,所以让我们按照建议重新塑造它:
x = df['Activity'].values.reshape(-1, 1)
现在代码甚至无法运行:Exception: Data must be 1-dimensional
被抛出。所以我很困惑。 1d即将被弃用,但数据也必须是1d ??如何安全地做到这一点?这里的问题是什么?
按@sascha
的要求编辑 x
看起来像这样:
array([ 0.00568953, 0.00634314, 0.00718003, ..., 0.01976002,
0.00575024, 0.00183782])
重塑后:
array([[ 0.00568953],
[ 0.00634314],
[ 0.00718003],
...,
[ 0.01976002],
[ 0.00575024],
[ 0.00183782]])
整个警告:
/usr/local/lib/python3.5/dist-packages/sklearn/preprocessing/data.py:321: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
/usr/local/lib/python3.5/dist-packages/sklearn/preprocessing/data.py:356: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning)
我重塑时的错误:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-132-df180aae2d1a> in <module>()
2 min_max_scaler = preprocessing.MinMaxScaler()
3 x_scaled = min_max_scaler.fit_transform(x)
----> 4 telecom['Activity'] = pd.Series(x_scaled)
/usr/local/lib/python3.5/dist-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
225 else:
226 data = _sanitize_array(data, index, dtype, copy,
--> 227 raise_cast_failure=True)
228
229 data = SingleBlockManager(data, index, fastpath=True)
/usr/local/lib/python3.5/dist-packages/pandas/core/series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
2918 elif subarr.ndim > 1:
2919 if isinstance(data, np.ndarray):
-> 2920 raise Exception('Data must be 1-dimensional')
2921 else:
2922 subarr = _asarray_tuplesafe(data, dtype=dtype)
Exception: Data must be 1-dimensional
答案 0 :(得分:6)
您只需删除pd.Series
:
import pandas as pd
from sklearn import preprocessing
df = pd.DataFrame({'Activity': [ 0.00568953, 0.00634314, 0.00718003,
0.01976002, 0.00575024, 0.00183782]})
x = df['Activity'].values.reshape(-1, 1) #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df['Activity'] = x_scaled
或者您可以明确获得x_scaled
的第一列:
df['Activity'] = pd.Series(x_scaled[:, 0])