对数据集进行部分编码

时间:2018-06-11 18:33:42

标签: python pandas numpy dataframe dataset

我想在LabelEncoder中编码一列数据集,在MinMaxScaler中编码其他一列。但它仍然是一个float64

BinEncoder = LabelEncoder()
scalar = MinMaxScaler()

dat = df.values
X = dat[0:500,0:5]
X[:,-1] = BinEncoder.fit_transform(X[:,-1])
X[:,0:4] = scalar.fit_transform(X[:,0:4])
print(X)

print(X),返回:

[[0.35435163 1.         0.96428571 0.05465126 0.        ]
 [0.07876241 0.85714286 0.85714286 0.04695418 0.        ]
 [0.11814948 0.64285714 0.5        0.08307676 3.        ]
 ...
 [0.25025542 0.79166667 0.54285714 0.10023708 1.        ]
 [0.25029285 1.         1.         0.0569226  1.        ]
 [0.25025127 1.         0.82608696 0.06935726 0.        ]]

完整代码:

import pandas as pd
from sklearn.preprocessing import LabelEncoder,MinMaxScaler
import numpy as np

df = pd.read_csv('./EURUSD_DATAFRAME.csv')
BinEncoder = LabelEncoder()
scalar = MinMaxScaler()


dat = df.values
#print(df.head())

X = dat[0:500,0:5]
Y = dat[:,5]
X[:,4] = BinEncoder.fit_transform(X[:,4])
print(X[:,-1])
X[:,0:4] = scalar.fit_transform(X[:,0:4])
print(X[:,-1])
print(X)
Y=BinEncoder.fit_transform(Y)

X = X.reshape(100,5,5)
#print(X[0])

1 个答案:

答案 0 :(得分:1)

由于您将DataFrame对象转换为numpy数组,因此必须为整个数组选择一种数据类型。如果您希望每列具有不同的数据类型,则需要将其保留为DataFrame。

import pandas as pd
from sklearn.preprocessing import LabelEncoder,MinMaxScaler
import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('ABCDE'))
BinEncoder = LabelEncoder()
scalar = MinMaxScaler()

X = df.loc[:500,'A':'D']
X['D'] = BinEncoder.fit_transform(X['D'])
X.loc[:,'A':'C'] = scalar.fit_transform(X.loc[:,'A':'C'])
print(X.dtypes)

希望有所帮助。