我是R的新手,我有以下查询。 我正在尝试运行以下for循环。在30,000行上运行需要20分钟。我想要运行400万行。我试过,这花了将近3天。有没有办法减少运行循环所需的时间。
for(i in length(Data$CLAIM):1)
{
if(i==length(Data$CLAIM))
{
Data$Net_Claim_Amt_Calc[i]=Data$INETCLMAMT[i]
Data$GOL_Calc[i]=Data$GOL[i]
Data$GLP_Calc[i]=Data$GLP[i]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]
}
else
{
if(Data$CLAIM[i]==Data$CLAIM[i+1])
{
Data$Net_Claim_Amt_Calc[i]=sum(Data$INETCLMAMT[i],Data$Net_Claim_Amt_Calc[i+1])
Data$GOL_Calc[i]=Data$GOL[i]+Data$GOL_Calc[i+1]
Data$GLP_Calc[i]=Data$GLP[i]+Data$GLP_Calc[i+1]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]+Data$NOLCLM_Calc[i+1]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]+Data$NLPCLM_Calc[i+1]
}
else
{
Data$Net_Claim_Amt_Calc[i]=Data$INETCLMAMT[i]
Data$GOL_Calc[i]=Data$GOL[i]
Data$GLP_Calc[i]=Data$GLP[i]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]
}
}
}
答案 0 :(得分:0)
此代码可以很容易地进行矢量化。
一个简单的数据集开始:
claims= sample(1:3, 10, replace=T)
使用diff
函数,我们可以确定连续值的位置
d = diff(claims)
equals = which(d==0)
您的代码现在类似于:
## Standard
Data$Net_Claim_Amt_Calc[i]=Data$INETCLMAMT[i]
Data$GOL_Calc[i]=Data$GOL[i]
Data$GLP_Calc[i]=Data$GLP[i]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]
价值相等的地方
i = equals
Data$Net_Claim_Amt_Calc[i]= sum(Data$INETCLMAMT[i],Data$Net_Claim_Amt_Calc[i+1])
Data$GOL_Calc[i]=Data$GOL[i]+Data$GOL_Calc[i+1]
Data$GLP_Calc[i]=Data$GLP[i]+Data$GLP_Calc[i+1]
Data$NOLCLM_Calc[i]=Data$NOLCLM[i]+Data$NOLCLM_Calc[i+1]
Data$NLPCLM_Calc[i]=Data$NLPCLM[i]+Data$NLPCLM_Calc[i+1]