我正在使用for循环。 for循环的目标是简单地测试条件并适当地编码数据。该循环必须迭代503,288个唯一值,并包括三个if语句。有没有一种方法可以加快for循环?
代码如下:
count<- 0
for(i in unique(Data$ID)){ #503288
#Subset Relevant Data
Loop_Before<- subset(Primary_Before, ID == i); Loop_After <- subset(Primary_After, ID == i)
if(nrow(Loop_Before) >= 1 & nrow(Loop_After) >= 1){
Data$Status[Data$ID == i] <- "Both Elections"
}
if(nrow(Loop_Before) >= 1 & nrow(Loop_After) == 0){
Data$Status[Data$ID == i] <- "Only Primary Election"
}
if(nrow(Loop_Before) >= 0 & nrow(Loop_After) == 1){
Data$Status[Data$ID == i] <- "Only General Election"
}
message(count<- count +1)
}
table(Data$Status)
谢谢您的帮助!
答案 0 :(得分:4)
完全避免for
循环。我不知道您的数据集,但以下内容应该快10倍甚至100倍:
library(tidyverse) # load some packages that will help
# let's create some sample data
Data <- data.frame(ID = c(1,1,1,1,2,2,2,3,3))
Primary_before <- data.frame(ID = c(0,1,2,2,3,3,3))
Primary_after <- data.frame(ID = c(1,3))
# now for every ID we count the nr of rows in both dataframes
summarised_before <- Primary_before %>%
group_by(ID) %>%
summarise(nrRows = n())
ID nrRows
<dbl> <int>
1 0 1
2 1 1
3 2 2
4 3 3
summarised_after <- Primary_after %>%
group_by(ID) %>%
summarise(nrRows = n())
ID nrRows
<dbl> <int>
1 1 1
2 3 1
# now we join them together
summarised_both <- summarised_after %>%
full_join(summarised_before, by = "ID", suffix = c("_after", "_before"))
# and now we do the final calculation
summarised_both %>%
mutate(nrRows_after = replace_na(nrRows_after, 0)) %>%
mutate(Status = case_when(nrRows_before >= 1 & nrRows_after >= 1 ~ "Both elections"
, nrRows_before >= 1 & nrRows_after == 0 ~ "Only primary election"
, nrRows_before >= 0 & nrRows_after == 1 ~ "Only general election")) %>%
filter(ID %in% Data$ID)
我保存了中间结果,但是您也可以一口气做到这一点,就像这样:
Primary_before %>%
group_by(ID) %>%
summarise(nrRows = n()) %>%
full_join(Primary_after %>%
group_by(ID) %>%
summarise(nrRows = n())
, by = "ID"
, suffix = c("_after", "_before")) %>%
mutate(nrRows_after = replace_na(nrRows_after, 0)) %>%
mutate(Status = case_when(nrRows_before >= 1 & nrRows_after >= 1 ~ "Both elections"
, nrRows_before >= 1 & nrRows_after == 0 ~ "Only primary election"
, nrRows_before >= 0 & nrRows_after == 1 ~ "Only general election")) %>%
filter(ID %in% Data$ID)