我正在尝试使用OneHotEncoder,但未成功。任何帮助表示赞赏。
我的CSV文件包含9列。我的自变量数组X由8列组成,而前7个是分类变量,最后一个DURATION_SEC是浮点型。
我的代码如下:
# Importing the libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv ( 'csv_final_results/KYC_READY_FOR_ANALYTICS.csv', index_col = [
'LE_COUNTRY', 'LE_INDUSTRY', 'CATEGORY_ID', 'REQUEST_ID', 'STATUS_ID',
'TYPE_OF_BUSINESS_ID', 'STATUS_NODE', 'DURATION_SEC', 'STATUS_TYPE' ], engine = 'c',
usecols = [ 'LE_COUNTRY', 'LE_INDUSTRY', 'CATEGORY_ID', 'REQUEST_ID', 'STATUS_ID',
'TYPE_OF_BUSINESS_ID', 'STATUS_NODE', 'DURATION_SEC',
'STATUS_TYPE' ] ).reset_index ()
X = dataset.iloc [ :, 0 :8 ].values
y = dataset.iloc [ :, 8 ].values
# Encoding categorical data
# Encoding the Independent Variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_Country = LabelEncoder ()
X [ :, 0 ] = labelencoder_X_Country.fit_transform ( X [ :, 0 ] )
labelencoder_X_Industry = LabelEncoder ()
X [ :, 1 ] = labelencoder_X_Industry.fit_transform ( X [ :, 1 ] )
labelencoder_X_Category = LabelEncoder ()
X [ :, 2 ] = labelencoder_X_Category.fit_transform ( X [ :, 2 ] )
labelencoder_X_Request = LabelEncoder ()
X [ :, 3 ] = labelencoder_X_Request.fit_transform ( X [ :, 3 ] )
labelencoder_X_Status = LabelEncoder ()
X [ :, 4 ] = labelencoder_X_Status.fit_transform ( X [ :, 4 ] )
labelencoder_X_TypeOfBusiness = LabelEncoder ()
X [ :, 5 ] = labelencoder_X_TypeOfBusiness.fit_transform ( X [ :, 5 ] )
labelencoder_X_StatusNode = LabelEncoder ()
X [ :, 6 ] = labelencoder_X_StatusNode.fit_transform ( X [ :, 6 ] )
onehotencoder = OneHotEncoder ( categories = 'auto' )
X = onehotencoder.fit_transform ( X ).toarray ()
labelencoder_y = LabelEncoder ()
y = labelencoder_y.fit_transform ( y )
我不是DURACTION_SEC,因为它不是分类变量。
但是,一旦适合变换X,我将失去DURATION_SEC信息。
如何在编码其余所有编码时避免丢失我的工时信息?
感谢您的帮助