如何将数据集(MSSQL DB)导入并拆分到python预测模型中

时间:2018-12-09 16:28:51

标签: python python-3.x

我将SQL DB作为我的Python预测模型的数据集,该模型使用XGBOOST来实现功能重要性,我能够导入数据集,但是我希望能够编写能够拆分数据集的代码

 from numpy import loadtxt
 from xgboost import XGBClassifier
 from matplotlib import pyplot
 from xgboost import plot_importance
 import pyodbc

 load data
 cnxn = pyodbc.connect("Driver={SQL Server};"
                  "server=MyInstance\CHURN;"
                  "Database=MyDB;"
                  "Trusted_Connection=yes;"
                  "user=sa;"
                  "password=Mypassword;")

 cursor = cnxn.cursor()
 dataset=cursor.execute('SELECT TOP 1000 [msno],[msnoid], 
 [payment_method_id],[payment_plan_days],[plan_list_price], 
 [actual_amount_paid],[is_auto_renew],[transaction_date], 
 [membership_expire_date],[is_cancel],[date],[num_25],[num_50],[num_75], 
 [num_985],[num_100],[num_unq],[total_secs]
 FROM [Churn_pred].[dbo].[Features]')

我使用下面的代码从excel数据集加载和拆分数据,并且工作正常,我需要对上述SQL DB进行同样的操作

dataset = loadtxt('D:\dataset\half_pima-indians-diabetes for testing.csv', 
delimiter=",")
split data into X and y
X = dataset[:,0:17]
y = dataset[:,17]
pyplot.xlabel('Smarts')
pyplot.ylabel('Probability')
model.fit(X, y)
model = XGBClassifier()
plot_importance(model)
pyplot.show()
fit model no training data
feature importance
print(model.feature_importances_)
plot
pyplot.bar(range(len(model.feature_importances_)), 
model.feature_importances_)
pyplot.show()

0 个答案:

没有答案