首先,MWE包含两个文件。目的是将CSV读入pandas数据帧,然后将每列中的所有值重新调整到范围(-1,1)。
data.csv:
Var1,Var2,Var3
2.1,6.4,5.2
7.9,2.1,1.3
5.0,6.1,6.7
mwe.py:
import pandas as pd
import sklearn.preprocessing
data = pd.read_csv("data.csv")
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (-1, 1))
number_of_columns = data.shape[1]
indices_of_feature_columns = range(0, number_of_columns)
data[indices_of_feature_columns] = scaler.fit_transform(data[indices_of_feature_columns])
当我执行此操作(Python 2.7.13,sklearn 0.18.1和pandas 0.20.3)时,我收到一条奇怪的错误消息:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "mwe.py", line 8, in <module>
data[indices_of_feature_columns] = scaler.fit_transform(data[indices_of_feature_columns])
File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
return self._getitem_array(key)
File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2002, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1231, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[0 1 2] not in index'
但是,当朋友使用看似相同的设置执行此代码时,代码会正确运行。
答案 0 :(得分:1)
试试这个:
import pandas as pd
import sklearn.preprocessing
data = pd.read_csv("data.csv")
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (-1, 1))
data = scaler.fit_transform(data)
结果:
In [15]: data
Out[15]:
array([[-1. , 1. , 0.44444444],
[ 1. , -1. , -1. ],
[ 0. , 0.86046512, 1. ]])
更新:如果您想将缩放后的data
保留为DataFrame:
In [18]: data = pd.DataFrame(scaler.fit_transform(data),
index=data.index,
columns=data.columns)
In [19]: data
Out[19]:
Var1 Var2 Var3
0 -1.0 1.000000 0.444444
1 1.0 -1.000000 -1.000000
2 0.0 0.860465 1.000000