Question

我有2个csv文件。一个是训练数据集，另一个是测试数据集。训练数据集包含36列。其中一栏是将A-F作为值的结果。测试数据集包含35个没有结果的列。我也想将结果列添加到测试数据集。我搜索了一些教程，但是没有找到应该遵循的方法。谁能告诉我应该遵循的过程？

Answer 1

这取决于您如何查找/计算需要添加的结果。

一种方法是将测试数据集作为Pandas数据框加载。计算结果并将值添加到列表中，然后将其添加到Pandas数据框中：

import pandas as pd

data = pd.DataFrame(columns=['Names', 'Age', 'Outcome'])

names = ['John', 'Nicole', 'Evan']
age = [53, 23, 27]

data['Names'] = names
data['Age'] = age

outcome = [6545, 5252, 85665]

data['Outcome'] = outcome

Answer 2

您尚未提供任何示例数据和要使用的技术，下面的代码将让您了解如何总体上进行预测：

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder

假设您已读取2个csv文件，train和test

X_train = train.loc[:, train.columns != 'Outcome'] # <---- Here you exclude the Outcome from train
y_train = train['Outcome'] # <---- This is your Outcome 

le = LabelEncoder()
y_train = le.fit_transform(y_train) # <---- Here you convert your A-F values to numeric(0-5)

我假设其余x变量都是数字。

rf = RandomForestClassifier() # <---- Here you call the classifier
rf.fit(X_train, y_train) # <---- Fit the classifier on train data
rf.score(X_train, y_train) # <---- Check model accuracy
y_pred = pd.DataFrame(rf.predict(test), columns = ['Outcome']) # <---- Make predictions on test data
test_pred = pd.concat([test, y_pred['Outcome']], axis = 1) # <---- Here you add predictions column to your test dataset
test_pred.to_excel(r'path\Test.xlsx')

使用训练数据进行预测

2 个答案: