我对r还很陌生,希望对加速以下“ for循环”代码有所帮助。
该代码旨在根据ROE列,按部门和月份为每一行创建一个百分等级。
我要在r中复制一个excel电子表格,该电子表格使用PERCENTRANK.INC函数,并且r代码需要完全复制它。我已经研究了r中的选项以匹配此函数,dplyr方法等,但是除了下面的循环外,似乎没有一个可以完全复制结果。
问题的症结在于,循环需要30分钟才能创建数据框的所有百分等级(输入数据框中的总行数约为90,000)。有没有人有任何技巧来加快以下循环?我已经在该网站上阅读了许多类似的问题/答案,并尝试了许多方法,例如对循环顶部附近的子集语句进行了调整,但是仅取得了有限的改进。
输入文件'ROE_Quintiles'的详细信息
非常感谢您的帮助。
SMG
我的r代码如下所示:
# Create dataframe to append to at the end of each iteration
ROE_Quintiles3 <- data.frame("Merge_Var3" = c('Temp'), "ROE2_percrank" = c(0.5))
End <- nrow(ROE_Quintiles)
system.time({
for(i in 1:End) {
Row <- ROE_Quintiles[i,]
Row_Value <- subset(Row, select=c(ROE2))
Row_Value2 <- mean(Row_Value$ROE2) # PercentRankArgument Value
Row_Sector_Month <- subset(Row, select=c(Merge_Var4))
Row_Sector_Month_Values <- subset(ROE_Quintiles, Merge_Var4==Row_Sector_Month$Merge_Var4, select=c(ROE2))
# Filter Number to values less than the row value
NumberLessThanArgument = subset(Row_Sector_Month_Values, ROE2 < Row_Value2)
# Filter Number to values greater than or equal to the row value
NumberGreaterThanOrEqualArgument = subset(Row_Sector_Month_Values, ROE2 >= Row_Value2)
# RankLower = the count of Numbers less than row value, and is used later for
# interpolation of ranks
RankLower <- nrow(NumberLessThanArgument)
# NumberLower = the largest Number < row value, used for interpolation
NumberLower <- ifelse(RankLower==0, Row_Value2, max(NumberLessThanArgument))
# NumberUpper = the smallest Number >= row value, used for interpolation
NumberUpper = min(NumberGreaterThanOrEqualArgument)
# PercentRankArgumentRank = the rank of row value over the Number table, which is
# just RankLower + 1. This is the same rank as NumberUpper in the Number table itself.
PercentRankArgumentRank = RankLower + 1
# InterpolationFraction = fraction that row value is from NumberLower to NumberUpper
InterpolationFraction <- ifelse(RankLower==0, 0, (Row_Value2 - NumberLower)/(NumberUpper - NumberLower))
# Calculate the interpolated rank
RankInterpolated = max(1, RankLower + InterpolationFraction * (PercentRankArgumentRank - RankLower))
# Get the count of Numbers
NumberCount = nrow(Row_Sector_Month_Values)
# Final PercentRank is (RankInterpolated - 1)/(NumberCount - 1)
PercentRankOutput = (RankInterpolated - 1)/(NumberCount - 1)
# Append to create main dataframe
Row_Output <- subset(Row, select=c(Merge_Var3))
Row_Output$ROE2_percrank <- PercentRankOutput
ROE_Quintiles3 <- rbind(ROE_Quintiles3, Row_Output)
}
})
ROE_Quintiles3 <- subset(ROE_Quintiles3, Merge_Var3 != 'Temp')
答案 0 :(得分:2)
由于大多数问题“我的R循环很慢”,因此问题通常与在循环内生长对象有关。当您在循环中看到ROE_Quintiles3 <- rbind(ROE_Quintiles3, Row_Output)
时,我想这就是问题所在。
请参阅https://privefl.github.io/blog/why-loops-are-slow-in-r/以了解我为您指出的问题和几种解决方案(提示:我会寻求类似于gen_list()
的东西)。