在Sql Server中的python中使用经过训练的高斯朴素贝叶斯模型

时间:2018-11-12 21:37:37

标签: python sql sql-server machine-learning scikit-learn

我正在使用以下代码来训练模型并将其存储为varbinary(max)表

如何将其与即将到来的新数据一起使用?

exec sp_execute_external_script

@language = N'Python'

,@script=
N'
import pandas as pd
import matplotlib.pyplot as plt
import pickle
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import Imputer
from sklearn.naive_bayes import GaussianNB
 from sklearn import metrics

df = InputDataSet

feature_col_names = [''num_preg'', ''glucose_conc'', ''diastolic_bp'', ''skin_thickness'', ''insulin'', ''bmi'', ''diab_pred'', ''age'']
predicted_class_names = [''diabetes'']

X = df[feature_col_names].values     # predictor feature columns (8 X m)
y = df[predicted_class_names].values # predicted class (1=true, 0=false) column (1 X m)
split_test_size = 0.30

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=split_test_size, random_state=42) 

nb_model = GaussianNB()
logitObj = nb_model.fit(X_train, y_train.ravel())
trained_model = pickle.dumps(logitObj)
'
,
@input_data_1 = N'SELECT [num_preg],[glucose_conc],[diastolic_bp],[skin_thickness],[insulin],[bmi],[diab_pred],[age],[diabetes] FROM [dbo].[dataset_1]',
@params = N'@trained_model varbinary(max) OUTPUT',
@trained_model = @trained_model OUTPUT;

INSERT INTO model_binary ([model])

现在,我正在尝试使用我的模型并预测新数据集的结果,但是在更改查询时却出错了。

DECLARE @model1 varbinary(max) = (select top 1 model from model_binary);

EXEC sp_execute_external_script
  @language = N'Python',
  @script = N'
import pickle;
import numpy;
from sklearn import metrics

mod = pickle.loads(model1)
X = InputDataSet[["num_preg","glucose_conc","diastolic_bp","skin_thickness","insulin","bmi","diab_pred","age"]]
y = numpy.ravel(InputDataSet[["diabetes"]])

probArray = mod.predict_proba(X)
probList = []
for i in range(len(probArray)):
  probList.append((probArray[i])[1])

probArray = numpy.asarray(probList)
fpr, tpr, thresholds = metrics.roc_curve(y, probArray)
aucResult = metrics.auc(fpr, tpr)
print ("AUC on testing data is: " + str(aucResult))

OutputDataSet = pandas.DataFrame(data = probList, columns = ["predictions"])
',  
  @input_data_1 = N'SELECT [num_preg],[glucose_conc],[diastolic_bp],[skin_thickness],[insulin],[bmi],[diab_pred],[age] FROM [dbo].[dataset_1_to_predict]',
  @input_data_1_name = N'InputDataSet',
  @params = N'@model1 varbinary(max)',
  @model1 = @model1
WITH RESULT SETS ((Score float));

请问是否有简单的解决方案,或者我的整体理解是错误的?

0 个答案:

没有答案