将数值数组转换为稀疏时出错

时间:2016-11-27 22:21:29

标签: python scikit-learn one-hot-encoding

我正在研究一个庞大的数据集,我在将数值数组转换为稀疏数据时遇到了问题。

import pandas as pd
l = pd.read_csv('merge_from_ofoct.csv')
l.drop('Unnamed: 12', axis=1, inplace=True)
l.drop('CRS_ARR_TIME', axis=1, inplace=True)
l.drop('CRS_DEP_TIME', axis=1, inplace=True)
l = l[(l.T != 0).any()]
count_nan = len(l) - l.count()  #gives the no. of rows with no values in it
l_no_missing = l.dropna()  #Dropping the rows with missing values
f = l_no_missing  #final dataframe with no missing values
count_nan = len(f) - f.count()  #verifying if the missing vaules are removed
count_nan
airport_data = pd.read_csv('Airport_data.csv', 
                               header = 0) 
training.drop(training.columns[0], axis = 1, inplace = True)
f['CARRIER'] = f['UNIQUE_CARRIER']
f["CARRIER"] = pd.factorize(f["CARRIER"])[0]
CARRIER = f[['UNIQUE_CARRIER', 'CARRIER']].drop_duplicates()
training = f
training.drop('UNIQUE_CARRIER', axis = 1, inplace = True)
scalingDF = training[['DISTANCE']] # Numerical features
categDF = training[['MONTH', 'DAY_OF_MONTH', 'ORIGIN_AIRPORT_ID', 
                   'DEST_AIRPORT_ID', 
                   'CARRIER', 'DAY_OF_WEEK']] # Categorical features


from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder() # Create encoder object
categDF_encoded = encoder.fit_transform(categDF) 

type(categDF_encoded) 
from scipy import sparse # Need this to create a sparse array
scalingDF_sparse = sparse.csr_matrix(scalingDF) #can't convert numerical array to sparse
  

TypeError Traceback(最近一次调用   最后)in()         1来自scipy import sparse#需要这个来创建一个稀疏数组   ----> 2 scalingDF_sparse = sparse.csr_matrix(scalingDF)

     

/Users/nikhil_maladkar/anaconda/lib/python2.7/site-packages/scipy/sparse/compressed.pyc   在 init (self,arg1,shape,dtype,copy)        67 self.format)        68来自.coo import coo_matrix   ---> 69 self._set_self(self。 class (coo_matrix(arg1,dtype = dtype)))        70        71#读取给定的矩阵尺寸,如果有的话

     

/Users/nikhil_maladkar/anaconda/lib/python2.7/site-packages/scipy/sparse/compressed.pyc   在 init (self,arg1,shape,dtype,copy)        29 arg1 = arg1.copy()        30其他:   ---> 31 arg1 = arg1.asformat(self.format)        32 self._set_self(arg1)        33

     

/Users/nikhil_maladkar/anaconda/lib/python2.7/site-packages/scipy/sparse/base.pyc   在asformat(自我,格式)       218回归自我       219其他:    - > 220返回getattr(自我,'到' +格式)()       221       222 ################################################## ##################

     

/Users/nikhil_maladkar/anaconda/lib/python2.7/site-packages/scipy/sparse/coo.pyc   在tocsr(个体经营)       328 indptr = np.empty(M + 1,dtype = idx_dtype)       329 indices = np.empty(self.nnz,dtype = idx_dtype)    - > 330 data = np.empty(self.nnz,dtype = upcast(self.dtype))       331       332 coo_tocsr(M,N,self.nnz,

     

/Users/nikhil_maladkar/anaconda/lib/python2.7/site-packages/scipy/sparse/sputils.pyc   在upcast(* args)        55返回t        56   ---> 57引发TypeError('类型不支持转换:%r'%(args,))        58        59

     

TypeError:类型不支持转换:(dtype(' O'),)

0 个答案:

没有答案