我正在处理DeepDive Data Science tutorial on MSDN后处理我的SQL Server 2016 RTM虚拟机中的信用卡欺诈数据。
我现在想要使用T-SQL集成R复制本教程和存储过程。我能够运行线性和逻辑回归模型,将结果打印为消息,并为两者创建存储过程。但是,我很困惑在使用sp_execute_external_script
过程时如何在R中编写预测脚本。
这就是我对线性和逻辑回归模型的看法。
编辑脚本以反映我在查看评论/答案后所做的更改。来自here和here
欺诈数据摘要统计:
CREATE PROC summary_proc
AS
begin
exec sp_execute_external_script
@language = N'R',
@script = N'
sumOut <- rxSummary(
formula = ~gender + balance + numTrans + numIntlTrans + creditLine,
data = ccFraud
)
print(sumOut)
OutputDataset <- data.frame(serialize(sumOut,NULL))
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudSmall]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'OutputDataset'
with result sets ((summary varbinary(max)));
END;
线性回归模型:
CREATE PROC linear_model
AS
begin
exec sp_execute_external_script
@language = N'R',
@script = N'
linModObj <- rxLinMod(
balance ~ gender + creditLine,
data = ccFraud
) ;
print(linModObj)
OutputDataset <- data.frame(serialize(linModObj, NULL));
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'OutputDataset'
with result sets ((linear_model varbinary(max)));
END;
Logistic回归模型:
create table logit_trained_model (
model varbinary (255)
);
CREATE PROC logit_model
AS
begin
insert into logit_trained_model
exec sp_execute_external_script
@language = N'R',
@script = N'
logitObj <- rxLogit(
fraudRisk ~ state + gender + cardholder + balance + numTrans + numIntlTrans + creditLine,
data = ccFraud,
dropFirst = TRUE
);
print(logitObj)
OutputDataset <- data.frame(serialize(logitObj, NULL));
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraud10]',
@input_data_1_name = N'ccFraud',
@output_data_1_name = N'OutputDataset'
--with result sets ((logit_model varbinary(max)));
END;
我想根据logit回归模型预测得分
这就是我现在所拥有的:
预测/评分:
CREATE PROC prediction
AS
begin
DECLARE @lmodel2 varbinary(max) = (SELECT top 1 model
FROM logit_trained_model);
exec sp_execute_external_script
@language = N'R',
@script = N'
logit_model_obj <- unserialize(as.raw(model));
print(summary(logit_model_obj))
OutputDataset <- rxPredict(
modelObject = logit_model_obj,
data = ccFraudScore,
outData = NULL,
predVarNames = "ccFraudLogitScore",
type = "link",
writeModelVars = TRUE,
extraVarsToWrite = "custID",
overwrite = TRUE
) ;
str(OutputDataset)
print(OutputDataset)
',
@input_data_1 = N'select * from [DeepDive].[db_datareader].[ccFraudScore10]',
@input_data_1_name = N'ccFraudScore',
@output_data_1_name = N'OutputDataset',
@params = N'@model varbinary(max)',
@model = @lmodel2
WITH RESULT SETS ((Score float));
以前,在编辑脚本之前,错误是 object'logitObj'未找到 。这是因为我在logitObj
之外的rxPredict
时指的是rxPredict
。{我已对我的脚本进行了更改,以便将logitObj
插入表格,并在rxPredict
中调用该表格
现在上面的所有脚本都反映了这种变化。但这是我面临的新错误:
Msg 39004, Level 16, State 20, Line 76 A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004. Msg 39019, Level 16, State 1, Line 76 An external script error occurred: Error in unserialize(as.raw(model)) : read error Calls: source -> withVisible -> eval -> eval -> unserialize Error in ScaleR. Check the output for more information. Error in eval(expr, envir, enclos) : Error in ScaleR. Check the output for more information. Calls: source -> withVisible -> eval -> eval -> .Call Execution halted Msg 11536, Level 16, State 1, Line 78 EXECUTE statement failed because its WITH RESULT SETS clause specified 1 result set(s), but the statement only sent 0 result set(s) at run time.
根据我的理解,R无法读取变量@model
。只是为了检查,我为变量[SELECT top 1 model FROM logit_trained_model]
运行了查询@lmodel2
以查看它是否带回了任何内容。显然,事实并非如此。该表只是一个名为model的列,其中没有数据。
我怎么做到这一点?
答案 0 :(得分:1)
您可以使用output_data_1或输出参数以序列化格式返回经过训练的模型,并将其存储在数据库表中。然后将模型作为输入参数传递回预测脚本。
请参阅In-Database Advanced Analytics for SQL Developers教程,特别是步骤5. Train and Save a Model using T-SQL和6. Operationalize the Model