运行循环所需的时间

时间:2015-12-18 08:30:21

标签: r

我是R的新手,我有以下查询。 我正在尝试运行以下for循环。在30,000行上运行需要20分钟。我想要运行400万行。我试过,这花了将近3天。有没有办法减少运行循环所需的时间。

for(i in length(Data$CLAIM):1)
{
  if(i==length(Data$CLAIM))
  {
    Data$Net_Claim_Amt_Calc[i]=Data$INETCLMAMT[i]
    Data$GOL_Calc[i]=Data$GOL[i]
    Data$GLP_Calc[i]=Data$GLP[i]
    Data$NOLCLM_Calc[i]=Data$NOLCLM[i]
    Data$NLPCLM_Calc[i]=Data$NLPCLM[i]
  }
  else 
  {
    if(Data$CLAIM[i]==Data$CLAIM[i+1])

    {
      Data$Net_Claim_Amt_Calc[i]=sum(Data$INETCLMAMT[i],Data$Net_Claim_Amt_Calc[i+1])
      Data$GOL_Calc[i]=Data$GOL[i]+Data$GOL_Calc[i+1]
      Data$GLP_Calc[i]=Data$GLP[i]+Data$GLP_Calc[i+1]
      Data$NOLCLM_Calc[i]=Data$NOLCLM[i]+Data$NOLCLM_Calc[i+1]
      Data$NLPCLM_Calc[i]=Data$NLPCLM[i]+Data$NLPCLM_Calc[i+1]
    }
    else
    {
      Data$Net_Claim_Amt_Calc[i]=Data$INETCLMAMT[i]
      Data$GOL_Calc[i]=Data$GOL[i]
      Data$GLP_Calc[i]=Data$GLP[i]
      Data$NOLCLM_Calc[i]=Data$NOLCLM[i]
      Data$NLPCLM_Calc[i]=Data$NLPCLM[i]
    }
  }

}

1 个答案:

答案 0 :(得分:0)

此代码可以很容易地进行矢量化。

一个简单的数据集开始:

claims= sample(1:3, 10, replace=T)

使用diff函数,我们可以确定连续值的位置

d = diff(claims)
equals = which(d==0)

您的代码现在类似于:

## Standard
Data$Net_Claim_Amt_Calc[i]=Data$INETCLMAMT[i]
Data$GOL_Calc[i]=Data$GOL[i]
Data$GLP_Calc[i]=Data$GLP[i]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]

价值相等的地方

i = equals
Data$Net_Claim_Amt_Calc[i]= sum(Data$INETCLMAMT[i],Data$Net_Claim_Amt_Calc[i+1])
Data$GOL_Calc[i]=Data$GOL[i]+Data$GOL_Calc[i+1]
Data$GLP_Calc[i]=Data$GLP[i]+Data$GLP_Calc[i+1]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]+Data$NOLCLM_Calc[i+1]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]+Data$NLPCLM_Calc[i+1]