我想在LabelEncoder
中编码一列数据集,在MinMaxScaler
中编码其他一列。但它仍然是一个float64
BinEncoder = LabelEncoder()
scalar = MinMaxScaler()
dat = df.values
X = dat[0:500,0:5]
X[:,-1] = BinEncoder.fit_transform(X[:,-1])
X[:,0:4] = scalar.fit_transform(X[:,0:4])
print(X)
print(X)
,返回:
[[0.35435163 1. 0.96428571 0.05465126 0. ]
[0.07876241 0.85714286 0.85714286 0.04695418 0. ]
[0.11814948 0.64285714 0.5 0.08307676 3. ]
...
[0.25025542 0.79166667 0.54285714 0.10023708 1. ]
[0.25029285 1. 1. 0.0569226 1. ]
[0.25025127 1. 0.82608696 0.06935726 0. ]]
完整代码:
import pandas as pd
from sklearn.preprocessing import LabelEncoder,MinMaxScaler
import numpy as np
df = pd.read_csv('./EURUSD_DATAFRAME.csv')
BinEncoder = LabelEncoder()
scalar = MinMaxScaler()
dat = df.values
#print(df.head())
X = dat[0:500,0:5]
Y = dat[:,5]
X[:,4] = BinEncoder.fit_transform(X[:,4])
print(X[:,-1])
X[:,0:4] = scalar.fit_transform(X[:,0:4])
print(X[:,-1])
print(X)
Y=BinEncoder.fit_transform(Y)
X = X.reshape(100,5,5)
#print(X[0])
答案 0 :(得分:1)
由于您将DataFrame对象转换为numpy数组,因此必须为整个数组选择一种数据类型。如果您希望每列具有不同的数据类型,则需要将其保留为DataFrame。
import pandas as pd
from sklearn.preprocessing import LabelEncoder,MinMaxScaler
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('ABCDE'))
BinEncoder = LabelEncoder()
scalar = MinMaxScaler()
X = df.loc[:500,'A':'D']
X['D'] = BinEncoder.fit_transform(X['D'])
X.loc[:,'A':'C'] = scalar.fit_transform(X.loc[:,'A':'C'])
print(X.dtypes)
希望有所帮助。