使用R嵌入在SQL中的累积量度

时间:2018-04-04 15:16:36

标签: sql r sql-server

原谅我,我是R的新手,而我只是在查看目前在SQL 2016环境中的选项。

我们目前要求提供累积的绩效回报。下面是一个示例数据集:

FundID  Date        FundReturn
ABC     1987-10-31  0
ABC     1987-11-30  -9.28669
ABC     1987-12-31  3.08304
ABC     1988-01-31  -3.00125
ABC     1988-02-29  0.61238
ABC     1988-03-31  4.29258
ABC     1988-04-30  0.13697
ABC     1988-05-31  2.57786
ABC     1988-06-30  2.36947
ABC     1988-07-31  0.57114
ABC     1988-08-31  -1.21550
ABC     1988-09-30  7.09027
ABC     1988-10-31  3.45807
ABC     1988-11-30  1.12679

我们需要获取此数据集并对其应用累积性能返回度量,以便数据集看起来如下所示:

FundID  Date        FundReturn      FundReturnCumu100   FundReturnCumu0
ABC     1987-10-31  0               1                   0
ABC     1987-11-30  -9.28669        0.9071331           -0.0928669
ABC     1987-12-31  3.08304         0.935100376         -0.064899624
ABC     1988-01-31  -3.00125        0.907035676         -0.092964324
ABC     1988-02-29  0.61238         0.912590181         -0.087409819
ABC     1988-03-31  4.29258         0.951763845         -0.048236155
ABC     1988-04-30  0.13697         0.953067476         -0.046932524
ABC     1988-05-31  2.57786         0.977636221         -0.022363779
ABC     1988-06-30  2.36947         1.000801018         0.000801018
ABC     1988-07-31  0.57114         1.006516993         0.006516993
ABC     1988-08-31  -1.2155         0.994282779         -0.005717221
ABC     1988-09-30  7.09027         1.064780113         0.064780113
ABC     1988-10-31  3.45807         1.101600954         0.101600954
ABC     1988-11-30  1.12679         1.114013684         0.114013684

我可以使用以下代码在SQL中创建它:

SELECT
        FundID          
    ,   [Date]          
    ,   FundReturn      

    ,   ISNULL  (
                    EXP(SUM(LOG(ABS(NULLIF((FundReturn+100)/100, 1))))
                        OVER(ORDER BY FundID, [Date] ROWS UNBOUNDED PRECEDING))
                ,1)                                                     AS FundReturnCumu100        

    ,   ISNULL  (
                    EXP(SUM(LOG(ABS(NULLIF((FundReturn+100)/100, 1))))
                        OVER(ORDER BY FundID, [Date] ROWS UNBOUNDED PRECEDING))
                ,1)-1                                                   AS FundReturnCumu0  

FROM #Worktable
ORDER BY [Date]

我还想测试是否可以使用嵌入到SQL存储过程中的R函数来获得相同的结果?上面的数学基本上是时间序列中性能回报的乘积,那么我可以使用的产品函数可以创建相同的结果数据集吗?

修改:到目前为止,我已经使用sp_execute_external_script返回了以下数据集的基础:

EXEC sp_execute_external_script
        @language       =   N'R'
    ,   @script         =   N'OutputDataSet<-InputDataSet'
    ,   @input_data_1   =   N'  SELECT * 
                                FROM [InMemory].[dbo].[CumulativePerformanceTest] 
                                ORDER BY [FundID],[Date]'

WITH RESULT SETS    (
                    (
                            [FundID]            NVARCHAR(50)
                        ,   [Date]              DATE
                        ,   [FundReturn]        NVARCHAR(255)
                    )
                    );


GO

我需要做些什么才能更改以上内容以在R中应用 FundReturnCumu100 FundReturnCumu0 计算?

由于

2 个答案:

答案 0 :(得分:10)

我们可以使用

将其转换为dplyr代码
library(dplyr)
df1 %>% 
   arrange(FundID, Date) %>%
   mutate(FundReturnCumu100 = exp(cumsum(log(abs((FundReturn + 100)/100)))), 
          FundReturnCumu0 = FundReturnCumu100 - 1)
# FundID       Date FundReturn FundReturnCumu100 FundReturnCumu0
#1     ABC 1987-10-31    0.00000         1.0000000    0.0000000000
#2     ABC 1987-11-30   -9.28669         0.9071331   -0.0928669000
#3     ABC 1987-12-31    3.08304         0.9351004   -0.0648996237
#4     ABC 1988-01-31   -3.00125         0.9070357   -0.0929643237
#5     ABC 1988-02-29    0.61238         0.9125902   -0.0874098186
#6     ABC 1988-03-31    4.29258         0.9517638   -0.0482361550
#7     ABC 1988-04-30    0.13697         0.9530675   -0.0469325241
#8     ABC 1988-05-31    2.57786         0.9776362   -0.0223637789
#9     ABC 1988-06-30    2.36947         1.0008010    0.0008010181
#10    ABC 1988-07-31    0.57114         1.0065170    0.0065169930
#11    ABC 1988-08-31   -1.21550         0.9942828   -0.0057172210
#12    ABC 1988-09-30    7.09027         1.0647801    0.0647801126
#13    ABC 1988-10-31    3.45807         1.1016010    0.1016009542
#14    ABC 1988-11-30    1.12679         1.1140137    0.1140136836

答案 1 :(得分:7)

经过大量钻研Google之后,我成功地解决了这个问题。最后我想出了以下内容:

DECLARE @R_Script NVARCHAR(MAX);

SET @R_Script = N'
                OutputDataSet <- InputDataSet;
                OutputDataSet[,6] <- exp(cumsum(log(abs((InputDataSet$FundReturn+100)/100))));
                OutputDataSet[,7] <- exp(cumsum(log(abs((InputDataSet$BenchmarkReturn+100)/100))));
                OutputDataSet[,8] <- exp(cumsum(log(abs((InputDataSet$SectorReturn+100)/100))));
                OutputDataSet[,9] <- (exp(cumsum(log(abs((InputDataSet$FundReturn+100)/100)))))-1;
                OutputDataSet[,10] <- (exp(cumsum(log(abs((InputDataSet$BenchmarkReturn+100)/100)))))-1;
                OutputDataSet[,11] <- (exp(cumsum(log(abs((InputDataSet$SectorReturn+100)/100)))))-1;';

DECLARE @SQL_Script NVARCHAR(MAX)

SET @SQL_Script = N'
                    SELECT 
                            FundID
                        ,   Date
                        ,   CONVERT(DECIMAL(38,6), FundReturn)          AS FundReturn
                        ,   CONVERT(DECIMAL(38,6), BenchmarkReturn)     AS BenchmarkReturn
                        ,   CONVERT(DECIMAL(38,6), SectorReturn)        AS SectorReturn

                    FROM [InMemory].[dbo].[CumulativePerformanceTest] 
                    WHERE FundID = ''F000002D0V''
                    ORDER BY FundID,Date;';

EXEC sp_execute_external_script

@language = N'R',

@script = @R_Script,

@input_data_1 = @SQL_Script

WITH RESULT SETS    (
                    (
                            [FundID]                    NVARCHAR(50)
                        ,   [Date]                      DATE
                        ,   [FundReturn]                DECIMAL(38,6)
                        ,   [BenchmarkReturn]           DECIMAL(38,6)
                        ,   [SectorReturn]              DECIMAL(38,6)
                        ,   [FundReturnCumu100]         DECIMAL(38,6)
                        ,   [BenchmarkReturnCumu100]    DECIMAL(38,6)
                        ,   [SectorReturnCumu100]       DECIMAL(38,6)
                        ,   [FundReturnCumu0]           DECIMAL(38,6)
                        ,   [BenchmarkReturnCumu0]      DECIMAL(38,6)
                        ,   [SectorReturnCumu0]         DECIMAL(38,6)
                    )
                    );

GO

我知道编码可能需要一点点整理,但它有效:)