将功能应用于多个主题并将结果保留在R中的表中

时间:2014-07-07 01:21:49

标签: r function apply subject

我将使用Chainladder包,并应用其功能mackchainladder()

我在Excel中有一个包含许多累积付款三角形的数据,格式如下:


ID  A    DY1    DY2     DY3     DY4     DY5     DY6     DY7     DY8     DY9     DY10
27  1   1526    2860    3575    3544    3700    3714    3714    3681    3669    3595
27  2   1717    3619    3706    3687    3547    3511    4306    4235    4231    
27  3   5042    9957    12912   14249   15801   15659   15766   15811       
27  4   8399    15946   19155   20120   20645   20357   20517           
27  5   6494    11274   13220   13907   14383   14435               
27  6   7340    13426   16916   18420   18605                   
27  7   11004   20474   26251   28964                       
27  8   13107   23065   26150                           
27  9   17250   23922                               
27  10  18833                                   
35  1   90751   130298  147618  159509  165517  169048  170988  171730  171854  172295
35  2   105370  152668  174635  189384  196637  200010  202436  203752  204294  
35  3   123692  181088  207566  225943  235993  243255  245759  247297      
35  4   142717  206614  236415  255642  265007  269450  272050          
35  5   167997  235772  267384  286297  294015  297560              
35  6   169744  237253  270713  288991  299322                  
35  7   204752  279957  317459  342275                      
35  8   231249  318056  359736                          
35  9   257927  358768                              
35  10  300881                                  

您可以在此处找到链梯包的数据和示例:

http://code.google.com/p/chainladder/wiki/Examples

示例链式输出

  

库(ChainLadder)
  M< -MackChainLadder(RAA,est.sigma =“Mack”)
  中号

MackChainLadder(Triangle = RAA,est.sigma =“Mack”)


     Latest Dev.To.Date Ultimate   IBNR Mack.S.E    CV
1981 18,834       1.000   18,834      0        0   NaN
1982 16,704       0.991   16,858    154      206 1.339
1983 23,466       0.974   24,083    617      623 1.010
1984 27,067       0.943   28,703  1,636      747 0.457
1985 26,180       0.905   28,927  2,747    1,469 0.535
1986 15,852       0.813   19,501  3,649    2,002 0.549
1987 12,314       0.694   17,749  5,435    2,209 0.406
1988 13,112       0.546   24,019 10,907    5,358 0.491
1989  5,395       0.336   16,045 10,650    6,333 0.595
1990  2,063       0.112   18,402 16,339   24,566 1.503

               Totals
Latest:    160,987.00
Ultimate:  213,122.23
IBNR:       52,135.23
Mack S.E.:  26,909.01
CV:              0.52

我有成千上万的ID(公司)。

我想将R Chain-Ladder方法(mackchainladder)应用于每个三角形,并获得IBNR的IBNR和SD。但是,我不知道如何将链梯应用于每个公司,以及如何在表格中得到结果,如下所示:


ID  IBNR  Mack_S.E.
11  11111  2222
24  33333  4444 (I made up those numbers)

我可以使用gapply()吗?那我的桌子怎么能得到我的结果呢?我是R的新手,欢迎任何评论。

1 个答案:

答案 0 :(得分:0)

所以这里的数据是以复制/粘贴形式提供的,我将其命名为dd

dd<-structure(list(ID = c(27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 
27L, 27L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 35L, 35L), 
    A = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 
    4L, 5L, 6L, 7L, 8L, 9L, 10L), DY1 = c(1526L, 1717L, 5042L, 
    8399L, 6494L, 7340L, 11004L, 13107L, 17250L, 18833L, 90751L, 
    105370L, 123692L, 142717L, 167997L, 169744L, 204752L, 231249L, 
    257927L, 300881L), DY2 = c(2860L, 3619L, 9957L, 15946L, 11274L, 
    13426L, 20474L, 23065L, 23922L, NA, 130298L, 152668L, 181088L, 
    206614L, 235772L, 237253L, 279957L, 318056L, 358768L, NA), 
    DY3 = c(3575L, 3706L, 12912L, 19155L, 13220L, 16916L, 26251L, 
    26150L, NA, NA, 147618L, 174635L, 207566L, 236415L, 267384L, 
    270713L, 317459L, 359736L, NA, NA), DY4 = c(3544L, 3687L, 
    14249L, 20120L, 13907L, 18420L, 28964L, NA, NA, NA, 159509L, 
    189384L, 225943L, 255642L, 286297L, 288991L, 342275L, NA, 
    NA, NA), DY5 = c(3700L, 3547L, 15801L, 20645L, 14383L, 18605L, 
    NA, NA, NA, NA, 165517L, 196637L, 235993L, 265007L, 294015L, 
    299322L, NA, NA, NA, NA), DY6 = c(3714L, 3511L, 15659L, 20357L, 
    14435L, NA, NA, NA, NA, NA, 169048L, 200010L, 243255L, 269450L, 
    297560L, NA, NA, NA, NA, NA), DY7 = c(3714L, 4306L, 15766L, 
    20517L, NA, NA, NA, NA, NA, NA, 170988L, 202436L, 245759L, 
    272050L, NA, NA, NA, NA, NA, NA), DY8 = c(3681L, 4235L, 15811L, 
    NA, NA, NA, NA, NA, NA, NA, 171730L, 203752L, 247297L, NA, 
    NA, NA, NA, NA, NA, NA), DY9 = c(3669L, 4231L, NA, NA, NA, 
    NA, NA, NA, NA, NA, 171854L, 204294L, NA, NA, NA, NA, NA, 
    NA, NA, NA), DY10 = c(3595L, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, 172295L, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("ID", 
"A", "DY1", "DY2", "DY3", "DY4", "DY5", "DY6", "DY7", "DY8", 
"DY9", "DY10"), class = "data.frame", row.names = c(NA, -20L))

然后,我们真的可以根据ID将data.frame拆分为块,然后将该函数应用于每个子集。

library('ChainLadder')

#helper function
dstack<-function(x) do.call(rbind, Map(function(id, dd) 
    cbind.data.frame(id=id, data.frame(as.list(dd))), names(x), x))

dstack(lapply(lapply(lapply(split(dd[,-(1:2)], dd$ID), MackChainLadder), summary), 
    function(x) setNames(x$Totals[c("IBNR:","Mack S.E.:"),],c("IBNR","Mack S.E."))))

这里我们分割数据(不包括前两列)并在每个子集上运行MacChainLadder。然后我们对每个对象执行摘要(计算统计数据),最后我们提取统计数据。然后我使用dstack将值堆叠回data.frame并合并每个公司的ID。有了这个样本数据,我得到了结果

   id      IBNR Mack.S.E.
27 27  35588.59  8839.536
35 35 481647.16 17130.185