我希望以一种具有dplyr感觉的通用方式在多个条件下进行过滤。我的目标是过滤以获得当组达到40000的目标时的第一个月。给出这些数据。
group month output cumulouput indi
(fctr) (int) (dbl) (dbl) (dbl)
A 1 9735.370 9735.37 0
A 2 10468.063 20203.43 0
A 3 11494.736 31698.17 0
B 1 10186.465 10186.46 0
B 2 9771.083 19957.55 0
B 3 9871.636 29829.18 0
B 4 9877.264 39706.45 0
B 5 9009.198 48715.65 1
B 6 9874.526 58590.17 1
C 1 10613.868 10613.87 0
C 2 10503.673 21117.54 0
C 3 10397.098 31514.64 0
C 4 9709.228 41223.87 1
C 5 9861.669 51085.54 1
C 6 9137.551 60223.09 1
对于每个小组,要获得小组获得目标时的最小月份以及小组未达到目标的最长月份。 (???)
这是过滤器的结果:
group month output cumulouput indi
(fctr) (int) (dbl) (dbl) (dbl)
A 3 11494.736 31698.17 0
B 5 9994.509 51800.365 1
C 4 9709.228 41223.87 1
对于数据:
library(dplyr)
df1 <- data.frame(group = rep(LETTERS[1:3], each=6), month = rep(1:6,3)) %>%
arrange(group,month) %>%
mutate(output = rnorm(n=18,mean = 10000, sd = 722))%>%
group_by(group) %>%
mutate(cumulouput=cumsum(output))%>%
filter(!(group=="A"&month>=4)) %>%
mutate( indi= ifelse(cumulouput>40000,1,0))
答案 0 :(得分:0)
这将为您提供所需的输出,但我觉得它可以缩短一点。
library(dplyr)
df1 <- data.frame(group = rep(LETTERS[1:3], each=6), month = rep(1:6,3)) %>%
arrange(group,month) %>%
mutate(output = rnorm(n=18,mean = 10000, sd = 722))%>%
group_by(group) %>%
mutate(cumulouput=cumsum(output))%>%
filter(!(group=="A"&month>=4)) %>%
mutate( indi= ifelse(cumulouput>40000,1,0))
one <- df1 %>%
group_by(group) %>%
.[.$cumulouput > 40000,] %>%
filter(row_number(cumulouput) == 1)
two <- df1 %>%
group_by(group) %>%
.[.$indi == 0,]
three <- rbind(one,two) %>%
group_by(group) %>%
filter(cumulouput == max(cumulouput))%>%
arrange(group)
head(three)
答案 1 :(得分:-1)
此处的逻辑如下所示,对于每一行group
,它会检查indi==1
TRUE
如果min
它返回FALSE
个月且目标是否满足{{1}它会返回max
个月,目标不满足。
然后filter
month
匹配我们刚刚添加的filter
max(indi)
group
以删除m
的前几个月。
最后删除临时列df1 %>% group_by(group) %>%
mutate(m=if_else(indi==1, min(.[.$indi==1,'month']), max(.[.$indi==0,'month']))) %>%
filter(month==m, indi==max(indi)) %>%
select(-m)
.simple-text h1,
.simple-text h2,
.simple-text h3 {
color: red;
}