如何提高决策树分类器的准确性?
我使用sklearn
用Python编写了决策树代码。我想检查该代码的准确性,以便在train
和test
中拆分数据。我曾尝试过test_size
和random_state
的比赛,但我总是将准确度从0.33提高到0.45(33%-45%)。我现在知道我在做什么错,因此,如果您知道出了什么问题,请帮助我。
我已从此处(http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength)下载了数据集。数据在excel文件中,它有9列和1300行,我已经用熊猫读过了。
我在第10列中计算了“具体类别”,即一个字符串值。第十因此,第十列有15个不同的字符串值(“具体类”)。我想根据前8列来预测具体的课程
这是代码:
import numpy as np
import pandas as pd
import xlrd
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn.metrics import accuracy_score
def predict_concrete_class(input_data, cement, blast_fur_slug,fly_ash,
water, superpl, coarse_aggr, fine_aggr, days):
data_for_tree = concrete_strenght_class(input_data)
variable_list = []
result_list = []
for index, row in data_for_tree.iterrows():
variable = row.tolist()
variable = variable[0:8]
variable_list.append(variable)
result_list.append(row[-1])
#accuracy of prediction(splitting the dataset)
var_train, var_test, res_train, res_test = train_test_split(variable_list, result_list, test_size = 0.3, random_state = 42)
decision_tree = tree.DecisionTreeClassifier()
decision_tree = decision_tree.fit(var_train, res_train)
input_values = [cement, blast_fur_slug, fly_ash, water, superpl, coarse_aggr, fine_aggr, days]
#calculating the accuracy
score = decision_tree.score(var_test, res_test)
score = round(score*100, 2)
prediction = decision_tree.predict([input_values])
prediction = prediction[0]
accuracy_info = "Accuracy of concrete class prediction: " + str(score) + " %\n"
prediction_info = "Prediction of future concrete class after "+ str(days)+" days: "+ str(prediction)
info = "\n" + accuracy_info + prediction_info
return info