我正在尝试对原始数据集进行决策树回归,但是数据集包含字符串值。我已经使用Label Encoder将字符串值转换为数字值,以便将它们用于决策树回归。现在,我想将Numpy数组值替换为原始的excel表(data.xlsx),以便可以进行决策树回归。谁能告诉我该怎么做?这是我的代码:
df = pd.read_excel(r'C:\Users\user\Desktop\data.xlsx') //data.xlsx is the original data file
from sklearn.preprocessing import LabelEncoder //what i used to encode the string values
le = LabelEncoder()
import numpy as np
df = np.array(df, dtype="object")
df[:,11] = le.fit_transform(df[:,11])
df[:, :] //my results
array([['4', '1', '2011', ..., '19', '3916', '46135'],
['4', '0', '2011', ..., '19', '3916', '40650'],
['4', '0', '2011', ..., '20', '3916', '36350'],
...,
['0', '901', '2012', ..., '16', '204', '50620'],
['0', '901', '2013', ..., '16', '204', '50920'],
['25', '902', '2006', ..., '17', '61', '28995']], dtype=object)
上面的输出是我的数组输出。我想将此输出替换回原始的excel数据文件。谁能教我如何将这些值替换为我的原始数据集?