Question

我正在尝试使用OneHotEncoder，但未成功。任何帮助表示赞赏。

我的CSV文件包含9列。我的自变量数组X由8列组成，而前7个是分类变量，最后一个DURATION_SEC是浮点型。

我的代码如下：

# Importing the libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Importing the dataset
dataset = pd.read_csv ( 'csv_final_results/KYC_READY_FOR_ANALYTICS.csv', index_col = [
    'LE_COUNTRY', 'LE_INDUSTRY', 'CATEGORY_ID', 'REQUEST_ID', 'STATUS_ID',
    'TYPE_OF_BUSINESS_ID', 'STATUS_NODE', 'DURATION_SEC', 'STATUS_TYPE' ], engine = 'c',
                        usecols = [ 'LE_COUNTRY', 'LE_INDUSTRY', 'CATEGORY_ID', 'REQUEST_ID', 'STATUS_ID',
                                    'TYPE_OF_BUSINESS_ID', 'STATUS_NODE', 'DURATION_SEC',
                                    'STATUS_TYPE' ] ).reset_index ()
X = dataset.iloc [ :, 0 :8 ].values
y = dataset.iloc [ :, 8 ].values

# Encoding categorical data
# Encoding the Independent Variable

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X_Country = LabelEncoder ()
X [ :, 0 ] = labelencoder_X_Country.fit_transform ( X [ :, 0 ] )
labelencoder_X_Industry = LabelEncoder ()
X [ :, 1 ] = labelencoder_X_Industry.fit_transform ( X [ :, 1 ] )
labelencoder_X_Category = LabelEncoder ()
X [ :, 2 ] = labelencoder_X_Category.fit_transform ( X [ :, 2 ] )
labelencoder_X_Request = LabelEncoder ()
X [ :, 3 ] = labelencoder_X_Request.fit_transform ( X [ :, 3 ] )
labelencoder_X_Status = LabelEncoder ()
X [ :, 4 ] = labelencoder_X_Status.fit_transform ( X [ :, 4 ] )
labelencoder_X_TypeOfBusiness = LabelEncoder ()
X [ :, 5 ] = labelencoder_X_TypeOfBusiness.fit_transform ( X [ :, 5 ] )
labelencoder_X_StatusNode = LabelEncoder ()
X [ :, 6 ] = labelencoder_X_StatusNode.fit_transform ( X [ :, 6 ] )


onehotencoder = OneHotEncoder ( categories = 'auto' )
X = onehotencoder.fit_transform ( X ).toarray ()

labelencoder_y = LabelEncoder ()
y = labelencoder_y.fit_transform ( y )

我不是DURACTION_SEC，因为它不是分类变量。

但是，一旦适合变换X，我将失去DURATION_SEC信息。

如何在编码其余所有编码时避免丢失我的工时信息？

感谢您的帮助

OneHotEncoder无法理解非分类变量

0 个答案: