我将SQL DB作为我的Python预测模型的数据集,该模型使用XGBOOST来实现功能重要性,我能够导入数据集,但是我希望能够编写能够拆分数据集的代码>
from numpy import loadtxt
from xgboost import XGBClassifier
from matplotlib import pyplot
from xgboost import plot_importance
import pyodbc
load data
cnxn = pyodbc.connect("Driver={SQL Server};"
"server=MyInstance\CHURN;"
"Database=MyDB;"
"Trusted_Connection=yes;"
"user=sa;"
"password=Mypassword;")
cursor = cnxn.cursor()
dataset=cursor.execute('SELECT TOP 1000 [msno],[msnoid],
[payment_method_id],[payment_plan_days],[plan_list_price],
[actual_amount_paid],[is_auto_renew],[transaction_date],
[membership_expire_date],[is_cancel],[date],[num_25],[num_50],[num_75],
[num_985],[num_100],[num_unq],[total_secs]
FROM [Churn_pred].[dbo].[Features]')
我使用下面的代码从excel数据集加载和拆分数据,并且工作正常,我需要对上述SQL DB进行同样的操作
dataset = loadtxt('D:\dataset\half_pima-indians-diabetes for testing.csv',
delimiter=",")
split data into X and y
X = dataset[:,0:17]
y = dataset[:,17]
pyplot.xlabel('Smarts')
pyplot.ylabel('Probability')
model.fit(X, y)
model = XGBClassifier()
plot_importance(model)
pyplot.show()
fit model no training data
feature importance
print(model.feature_importances_)
plot
pyplot.bar(range(len(model.feature_importances_)),
model.feature_importances_)
pyplot.show()