我有一个包含类别变量的数据框,我想应用OneHotEncoder。我的问题是在OneHotEncoder之前使用LabelEncoder解决的,但对我来说这没有意义,因为使用最新更新,OneHotEncoder接受分类变量的字符串。
示例数据框,您可以在上测试代码:
import jQuery from 'jquery'
import 'angular'
这是我尝试过的:
我尝试同时使用索引值和列名来解决错误:
data = pd.DataFrame({'col1': {0: 'ab321', 1: 'ab568', 2: 'mkld78'},
'col2': {0: 'Red', 1: 'Blue', 2: 'Green'},
'col3': {0: 'First', 1: 'Second', 2: 'Third'},
'col4': {0: 'Wisconsin', 1: 'California', 2: 'Portland'},
'col5': {0: 'a', 1: 'f', 2: 'g'},
'col6': {0: 1, 1: 2, 2: 3},
'target': {0: 0, 1: 0, 2: 1}})
#Index
# OneHotEncoding
from sklearn.preprocessing import OneHotEncoder
import numpy as np
import pandas as pd
#Load data
train = pd.read_csv("data_train.csv")
test = pd.read_csv("data_test.csv")
X= train.drop(["target"], axis = 1)
y= train["target"]
# Filter categorical columns
categorical_columns = ["col1","col2","col3","col4","col5"]
categorical_indexes = np.where(X.dtypes == 'object')[0]
# OHE
ohe = OneHotEncoder(categorical_features = categorical_columns)
# reshape data
for index in categorical_indexes:
X.iloc[:,index] = ohe.fit_transform(X.iloc[:,index].values.reshape(-1,1))
错误回溯:
#Column Names
# OneHotEncoding
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
train = pd.read_csv("data_train.csv")
test = pd.read_csv("data_test.csv")
X= train.drop(["target"], axis = 1)
y= train["target"]
# Filter categorical columns
categorical_columns = ["col1","col2","col3","col4","col5"]
categorical_indexes = np.where(X.dtypes == 'object')[0]
# OHE
ohe = OneHotEncoder(categorical_features = categorical_columns)
# reshape data
for column in categorical_columns:
X[column] = ohe.fit_transform(X[column].values.reshape(-1,1))
答案 0 :(得分:1)
您缺少OnehotEncoder
的概念。使用它的方法是使其适合整个训练集。
使用此:
data = pd.DataFrame({'col1': {0: 'ab321', 1: 'ab568', 2: 'mkld78'},
'col2': {0: 'Red', 1: 'Blue', 2: 'Green'},
'col3': {0: 'First', 1: 'Second', 2: 'Third'},
'col4': {0: 'Wisconsin', 1: 'California', 2: 'Portland'},
'col5': {0: 'a', 1: 'f', 2: 'g'},
'col6': {0: 1, 1: 2, 2: 3},
'target': {0: 0, 1: 0, 2: 1}})
# OneHotEncoding
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
train = data.iloc[0:2,:]
test = data.iloc[2:,:]
X= train.drop(["target"], axis = 1)
y= train["target"]
# Filter categorical columns
categorical_columns = ["col1","col2","col3","col4","col5"]
categorical_indexes = np.where(X.dtypes == 'object')[0]
# OHE
ohe = OneHotEncoder()
X_ = ohe.fit_transform(X)
X_
# <2x12 sparse matrix of type '<type 'numpy.float64'>'
# with 12 stored elements in Compressed Sparse Row format>