Python:将预测的y变量标签组合到数据框

时间:2018-10-04 15:11:14

标签: python python-3.x pandas scikit-learn

我有一个多类标签预测问题来识别 以水果为例。我能够从模型,拟合和预测函数中获得预测。我也训练并测试了模型。下面是代码。我正在尝试将变量“ forest_y_pred”中的“ y个预测”合并到我的原始数据集中,以便可以将原始目标变量预测目标变量进行比较数据框。我有2个问题:

1)y_testforest_y_pred = forest.predict(X_test)相同。比较时,我得到的结果完全相同。我搞错了吗?我在这里有点困惑,predict()是为了预测X_test不会产生与y_test完全相同的结果

2)我正在尝试将forest_y_pred = forest.predict(X_test)合并回df。这是我尝试过的操作:Merging results from model.predict() with original pandas DataFrame?

from sklearn.ensemble import RandomForestClassifier
import pandas as pd 

# Load Data
df = pd.read_excel('../data/file.xlsx',converters={'col1':str})
df = df.set_index('INDEX_ID') # Setting index id
df

# Doing this way because of setting index. INDEX_ID is a column in the df 
X_train, X_test, y_train, y_test = train_test_split(df.ix[:, ~df.columns.isin(['Target'])], df.Target,train_size=0.5)

print(y_test[:5])
type(y_test) #pandas.core.series.Series

ID
12      Apples
124     Oranges
345     Apples
123     Oranges
232     Kiwi

forest = RandomForestClassifier()

# Training
forest_model = forest.fit(X_train, y_train)
print(forest_model)

# Predictions
forest_y_pred = forest.predict(X_test) 
print("forest_y_pred:\n",forest_y_pred[:5])
['Apples','Oranges','Apples','Oranges','Kiwi']

y_test['preds'] = forest_y_pred
print(y_test['preds'][:5])
['Apples','Oranges','Apples','Oranges','Kiwi']

df_out = pd.merge(df,y_test[['preds']],how = 'left',left_index = True, right_index = True)
# ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
# How do I fix this? I tried ton of ways to convert ndarray, serries, dataframe...nothing is working so far what I tried. Thanks a bunch!!

0 个答案:

没有答案