在规范化之后丢失csv文件的头部

时间:2017-05-03 10:09:02

标签: python csv pandas

我已经编写了以下代码来读取csv文件运行列式标准化:

from sklearn import preprocessing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# reading Train values
Training ='Training.csv'
df       = pd.read_csv(Training)
df =df.drop(df.columns[len(df.loc[1])-1],axis=1)
df =df.drop(df.columns[len(df.loc[1])-1],axis=1)  
df.describe()
minmax_scaler= preprocessing.MinMaxScaler()
np_scaled = minmax_scaler.fit_transform(df)
normalized = pd.DataFrame(np_scaled)
normalized.describe()
np.shape(df)
np.shape(normalized)

我的问题是为什么我无法在normalized列表中看到标题?尽管它具有相同的df形状,但我试图在没有标题的情况下读取csv文件,但程序崩溃了?

..............................
df       = pd.read_csv(Training,header=None)
.........................

提供以下内容:

    ValueError                                Traceback (most recent call last)
<ipython-input-15-dd18ba2a6204> in <module>()
     14 df.describe()
     15 minmax_scaler= preprocessing.MinMaxScaler()
---> 16 np_scaled = minmax_scaler.fit_transform(df)
     17 normalized = pd.DataFrame(np_scaled)
     18 normalized.describe()

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\base.py in fit_transform(self, X, y, **fit_params)
    492         if y is None:
    493             # fit method of arity 1 (unsupervised transformation)
--> 494             return self.fit(X, **fit_params).transform(X)
    495         else:
    496             # fit method of arity 2 (supervised transformation)

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\preprocessing\data.py in fit(self, X, y)
    290         # Reset internal state before fitting
    291         self._reset()
--> 292         return self.partial_fit(X, y)
    293 
    294     def partial_fit(self, X, y=None):

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\preprocessing\data.py in partial_fit(self, X, y)
    316 
    317         X = check_array(X, copy=self.copy, ensure_2d=False, warn_on_dtype=True,
--> 318                         estimator=self, dtype=FLOAT_DTYPES)
    319 
    320         if X.ndim == 1:

c:\users\engine\appdata\local\programs\python\python35\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    380                                       force_all_finite)
    381     else:
--> 382         array = np.array(array, dtype=dtype, order=order, copy=copy)
    383 
    384         if ensure_2d:

ValueError: could not convert string to float: 'Feature458'

我很高兴有关于如何解决这个问题的任何暗示!

1 个答案:

答案 0 :(得分:1)

嗯,那是因为你使用preprocessing.MinMaxScaler()返回一个数组,而不是数据帧。 在基于此矩阵创建数据框后,它对您的列没有任何了解。

您可以尝试类似

的内容
normalized = pd.DataFrame(np_scaled, columns=df.columns)

使用后一个示例(使用header=False),您只需将标题作为第一行。当sklearn尝试将列名转换为整数时,您会收到错误。