我已经编写了以下代码来读取csv文件运行列式标准化:
from sklearn import preprocessing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# reading Train values
Training ='Training.csv'
df = pd.read_csv(Training)
df =df.drop(df.columns[len(df.loc[1])-1],axis=1)
df =df.drop(df.columns[len(df.loc[1])-1],axis=1)
df.describe()
minmax_scaler= preprocessing.MinMaxScaler()
np_scaled = minmax_scaler.fit_transform(df)
normalized = pd.DataFrame(np_scaled)
normalized.describe()
np.shape(df)
np.shape(normalized)
我的问题是为什么我无法在normalized
列表中看到标题?尽管它具有相同的df
形状,但我试图在没有标题的情况下读取csv文件,但程序崩溃了?
..............................
df = pd.read_csv(Training,header=None)
.........................
提供以下内容:
ValueError Traceback (most recent call last)
<ipython-input-15-dd18ba2a6204> in <module>()
14 df.describe()
15 minmax_scaler= preprocessing.MinMaxScaler()
---> 16 np_scaled = minmax_scaler.fit_transform(df)
17 normalized = pd.DataFrame(np_scaled)
18 normalized.describe()
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
492 if y is None:
493 # fit method of arity 1 (unsupervised transformation)
--> 494 return self.fit(X, **fit_params).transform(X)
495 else:
496 # fit method of arity 2 (supervised transformation)
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\preprocessing\data.py in fit(self, X, y)
290 # Reset internal state before fitting
291 self._reset()
--> 292 return self.partial_fit(X, y)
293
294 def partial_fit(self, X, y=None):
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\preprocessing\data.py in partial_fit(self, X, y)
316
317 X = check_array(X, copy=self.copy, ensure_2d=False, warn_on_dtype=True,
--> 318 estimator=self, dtype=FLOAT_DTYPES)
319
320 if X.ndim == 1:
c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
380 force_all_finite)
381 else:
--> 382 array = np.array(array, dtype=dtype, order=order, copy=copy)
383
384 if ensure_2d:
ValueError: could not convert string to float: 'Feature458'
我很高兴有关于如何解决这个问题的任何暗示!
答案 0 :(得分:1)
嗯,那是因为你使用preprocessing.MinMaxScaler()
返回一个数组,而不是数据帧。
在基于此矩阵创建数据框后,它对您的列没有任何了解。
您可以尝试类似
的内容normalized = pd.DataFrame(np_scaled, columns=df.columns)
使用后一个示例(使用header=False
),您只需将标题作为第一行。当sklearn尝试将列名转换为整数时,您会收到错误。