我在数据库中有一组足球数据,我试图预测它的值。
library(dplyr)
library(tidyr)
df1 %>%
group_by(village) %>%
separate_rows(String, sep=",\\s*") %>%
filter(nzchar(String)) %>%
count(village, String) %>%
spread(String, n, fill = 0)
# A tibble: 3 x 5
# Groups: village [3]
# village fd_sec ht_rm `NA` san
#* <chr> <dbl> <dbl> <dbl> <dbl>
#1 A 1.00 2.00 1.00 0
#2 B 1.00 0 0 0
#3 C 0 1.00 0 1.00
将数据集加载到数据框并运行test_train_split并填入后,如何预测未见数据集的值并返回game_id&和预测值(FTR)?
正如您在代码中看到的,我有一个表(tmp_all_output_id),我在其中选择已知的结果值到游戏&#39;并选择未知(或未播放)的结果为&#39; predict_games&#39;。我还为&#39; predict_games&#39;设置了FTR(全职结果)。 = -10,此时这些游戏的结果尚不清楚。
但是,我如何使用我所做的培训来预测数据框的FTR&#39; predict_games&#39;?
我尝试使用这段代码进行预测,但是对于FTR来说它总是带回0(绘图),这肯定是不正确的。
import MySQLdb
import pandas as pd
from sklearn.feature_selection import RFE
from sqlalchemy import create_engine
import mysql.connector
from matplotlib import pyplot
mysql_cn= MySQLdb.connect(host='database.rds.amazonaws.com',port=3306,user='username', passwd='password', db='dev')
games = pd.read_sql('SELECT game_id, game_date_id, home_team_id, away_team_id, referee_id, FTR, away_team_travel FROM
dev.tmp_all_output_id WHERE game_id < 6700;', con=mysql_cn)
predict_games = pd.read_sql('SELECT game_id, game_date_id,
home_team_id, away_team_id, referee_id, -10 AS FTR, away_team_travel FROM dev.tmp_all_output_id WHERE game_id > 6700;', con=mysql_cn)
feature_names = ['game_id', 'game_date_id', 'home_team_id', 'away_team_id', 'referee_id', 'away_team_travel']
X = games[feature_names]
y = games['FTR']
# #Create Training and Test Sets and Apply Scaling
from sklearn.model_selection import train_test_split
validation_size = 0.20
seed = 7
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=validation_size, random_state=0)
from sklearn.ensemble import AdaBoostClassifier
ada = AdaBoostClassifier()
ada.fit(X_train, y_train)
predictions = ada.predict(X_test)
print('Accuracy of AdaBoostClassifier on training set: {:.2f}'.format(ada.score(X_train, y_train)))
print('Accuracy of AdaBoostClassifier on test set: {:.2f}'.format(ada.score(X_test, y_test)))
#cnx = create_engine('mysql+mysqlconnector://username:password@database.rds.amazonaws.com:3306/dev', echo=False)
#testResults.to_sql(name='tmp_all_output_prediction', con=cnx, if_exists = 'replace', index=False)
mysql_cn.close()
我添加了以下代码:
testResults = predict_games[['game_id']]
testResults.is_copy = None
testResults['FTR'] = raw_prediction
然而,每个预测值都返回为:-1(离开胜利),这是不正确的
答案 0 :(得分:0)
您的ada
变量现在是经过训练的分类器实例。为了使用它来对新数据进行分类,您需要使用与X
对应的格式构建'game_id', 'game_date_id', 'home_team_id', 'away_team_id', 'referee_id', 'away_team_travel'
数据。
然后你运行ada.predict(X)
,你就完成了!
问题是你目前只传递了game_id。