我有一个多类标签预测问题来识别 以水果为例。我能够从模型,拟合和预测函数中获得预测。我也训练并测试了模型。下面是代码。我正在尝试将变量“ forest_y_pred”中的“ y个预测”合并到我的原始数据集中,以便可以将原始目标变量与预测目标变量进行比较数据框。我有2个问题:
1)y_test
与forest_y_pred = forest.predict(X_test)
相同。比较时,我得到的结果完全相同。我搞错了吗?我在这里有点困惑,predict()
是为了预测X_test
不会产生与y_test
完全相同的结果
2)我正在尝试将forest_y_pred = forest.predict(X_test)
合并回df
。这是我尝试过的操作:Merging results from model.predict() with original pandas DataFrame?
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
# Load Data
df = pd.read_excel('../data/file.xlsx',converters={'col1':str})
df = df.set_index('INDEX_ID') # Setting index id
df
# Doing this way because of setting index. INDEX_ID is a column in the df
X_train, X_test, y_train, y_test = train_test_split(df.ix[:, ~df.columns.isin(['Target'])], df.Target,train_size=0.5)
print(y_test[:5])
type(y_test) #pandas.core.series.Series
ID
12 Apples
124 Oranges
345 Apples
123 Oranges
232 Kiwi
forest = RandomForestClassifier()
# Training
forest_model = forest.fit(X_train, y_train)
print(forest_model)
# Predictions
forest_y_pred = forest.predict(X_test)
print("forest_y_pred:\n",forest_y_pred[:5])
['Apples','Oranges','Apples','Oranges','Kiwi']
y_test['preds'] = forest_y_pred
print(y_test['preds'][:5])
['Apples','Oranges','Apples','Oranges','Kiwi']
df_out = pd.merge(df,y_test[['preds']],how = 'left',left_index = True, right_index = True)
# ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
# How do I fix this? I tried ton of ways to convert ndarray, serries, dataframe...nothing is working so far what I tried. Thanks a bunch!!