如何使用sp_execute_external_script过程在SQL Server集成R中运行预测(或评分)?

时间:2016-06-24 23:19:25

标签: r stored-procedures enterprise sql-server-2016

我正在处理DeepDive Data Science tutorial on MSDN后处理我的SQL Server 2016 RTM虚拟机中的信用卡欺诈数据。
我现在想要使用T-SQL集成R复制本教程和存储过程。我能够运行线性和逻辑回归模型,将结果打印为消息,并为两者创建存储过程。但是,我很困惑在使用sp_execute_external_script过程时如何在R中编写预测脚本。

这就是我对线性和逻辑回归模型的看法。

编辑脚本以反映我在查看评论/答案后所做的更改。来自herehere

的帮助
  

欺诈数据摘要统计:

CREATE PROC summary_proc
AS
begin
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                sumOut <- rxSummary(
                                    formula = ~gender + balance + numTrans + numIntlTrans + creditLine, 
                                    data = ccFraud
                                    )
                print(sumOut)
                OutputDataset <- data.frame(serialize(sumOut,NULL))
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudSmall]',
    @input_data_1_name = N'ccFraud',
    @output_data_1_name = N'OutputDataset'
    with result sets ((summary varbinary(max)));
END;
  

线性回归模型:

CREATE PROC linear_model
AS
begin
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                    linModObj <- rxLinMod(
                                            balance ~ gender + creditLine,  
                                            data = ccFraud
                                            ) ;
                    print(linModObj)
                    OutputDataset <- data.frame(serialize(linModObj, NULL)); 
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
    @input_data_1_name = N'ccFraud',
    @output_data_1_name = N'OutputDataset'
    with result sets ((linear_model varbinary(max)));
END;
  

Logistic回归模型:

create table logit_trained_model (
model varbinary (255)
);
CREATE PROC logit_model
AS
begin
insert into logit_trained_model
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                    logitObj <- rxLogit(
                                        fraudRisk ~ state + gender + cardholder + balance + numTrans + numIntlTrans + creditLine, 
                                        data = ccFraud,
                                        dropFirst = TRUE
                                        );
                    print(logitObj)
                    OutputDataset <- data.frame(serialize(logitObj, NULL));  
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
    @input_data_1_name = N'ccFraud',
    @output_data_1_name = N'OutputDataset'
    --with result sets ((logit_model varbinary(max))); 
END;

我想根据logit回归模型预测得分 这就是我现在所拥有的:

  

预测/评分:

CREATE PROC prediction
AS
begin
DECLARE @lmodel2 varbinary(max) = (SELECT top 1 model  
                                        FROM logit_trained_model);
exec sp_execute_external_script
    @language = N'R',
    @script = N'
                    logit_model_obj <- unserialize(as.raw(model));
                    print(summary(logit_model_obj))
                    OutputDataset <- rxPredict(
                                            modelObject = logit_model_obj,   
                                            data = ccFraudScore,        
                                            outData = NULL,     
                                            predVarNames = "ccFraudLogitScore",   
                                            type = "link",      
                                            writeModelVars = TRUE,
                                            extraVarsToWrite = "custID",        
                                            overwrite = TRUE
                                            ) ;
                    str(OutputDataset)
                    print(OutputDataset)
                ',
    @input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudScore10]',
    @input_data_1_name = N'ccFraudScore',
    @output_data_1_name = N'OutputDataset',
    @params = N'@model varbinary(max)',  
    @model = @lmodel2  
    WITH RESULT SETS ((Score float)); 

以前,在编辑脚本之前,错误是 object'logitObj'未找到 。这是因为我在logitObj之外的rxPredict时指的是rxPredict。{我已对我的脚本进行了更改,以便将logitObj插入表格,并在rxPredict中调用该表格 现在上面的所有脚本都反映了这种变化。但这是我面临的新错误:

Msg 39004, Level 16, State 20, Line 76 A 'R' script error
occurred during execution of 'sp_execute_external_script' with HRESULT
0x80004004. Msg 39019, Level 16, State 1, Line 76 An external script
error occurred:  Error in unserialize(as.raw(model)) : read error
Calls: source -> withVisible -> eval -> eval -> unserialize

Error in ScaleR.  Check the output for more information. Error in
eval(expr, envir, enclos) :    Error in ScaleR.  Check the output for
more information. Calls: source -> withVisible -> eval -> eval ->
.Call Execution halted Msg 11536, Level 16, State 1, Line 78 EXECUTE
statement failed because its WITH RESULT SETS clause specified 1
result set(s), but the statement only sent 0 result set(s) at run
time. 


根据我的理解,R无法读取变量@model。只是为了检查,我为变量[SELECT top 1 model FROM logit_trained_model]运行了查询@lmodel2以查看它是否带回了任何内容。显然,事实并非如此。该表只是一个名为model的列,其中没有数据。

我怎么做到这一点?

1 个答案:

答案 0 :(得分:1)

您可以使用output_data_1或输出参数以序列化格式返回经过训练的模型,并将其存储在数据库表中。然后将模型作为输入参数传递回预测脚本。

请参阅In-Database Advanced Analytics for SQL Developers教程,特别是步骤5. Train and Save a Model using T-SQL6. Operationalize the Model