dplyr-条件和多个过滤器分组

时间:2016-12-20 19:47:28

标签: r filter dplyr

我希望以一种具有dplyr感觉的通用方式在多个条件下进行过滤。我的目标是过滤以获得当组达到40000的目标时的第一个月。给出这些数据。

group month    output cumulouput  indi
(fctr) (int)     (dbl)      (dbl) (dbl)
  A     1  9735.370    9735.37     0
  A     2 10468.063   20203.43     0
  A     3 11494.736   31698.17     0
  B     1 10186.465   10186.46     0
  B     2  9771.083   19957.55     0
  B     3  9871.636   29829.18     0
  B     4  9877.264   39706.45     0
  B     5  9009.198   48715.65     1
  B     6  9874.526   58590.17     1
  C     1 10613.868   10613.87     0
  C     2 10503.673   21117.54     0
  C     3 10397.098   31514.64     0
  C     4  9709.228   41223.87     1
  C     5  9861.669   51085.54     1
  C     6  9137.551   60223.09     1

对于每个小组,要获得小组获得目标时的最小月份以及小组未达到目标的最长月份。 (???)

这是过滤器的结果:

group   month    output cumulouput  indi
(fctr) (int)     (dbl)      (dbl) (dbl)
  A     3 11494.736   31698.17     0
  B     5  9994.509  51800.365     1
  C     4  9709.228   41223.87     1

对于数据:

library(dplyr)
df1 <- data.frame(group = rep(LETTERS[1:3], each=6),  month = rep(1:6,3))     %>% 
arrange(group,month) %>% 
mutate(output = rnorm(n=18,mean = 10000, sd = 722))%>%
group_by(group) %>%
mutate(cumulouput=cumsum(output))%>% 
filter(!(group=="A"&month>=4)) %>% 
mutate( indi= ifelse(cumulouput>40000,1,0))

2 个答案:

答案 0 :(得分:0)

这将为您提供所需的输出,但我觉得它可以缩短一点。

library(dplyr)
  df1 <- data.frame(group = rep(LETTERS[1:3], each=6),  month = rep(1:6,3))     %>% 
  arrange(group,month) %>% 
  mutate(output = rnorm(n=18,mean = 10000, sd = 722))%>%
  group_by(group) %>%
  mutate(cumulouput=cumsum(output))%>% 
  filter(!(group=="A"&month>=4)) %>% 
  mutate( indi= ifelse(cumulouput>40000,1,0))

one <- df1 %>%
  group_by(group) %>%
  .[.$cumulouput > 40000,] %>% 
  filter(row_number(cumulouput) == 1)

two <- df1 %>%
  group_by(group) %>%
  .[.$indi == 0,]

three <- rbind(one,two) %>%
  group_by(group) %>%
  filter(cumulouput == max(cumulouput))%>%
  arrange(group)

head(three)

答案 1 :(得分:-1)

此处的逻辑如下所示,对于每一行group,它会检查indi==1 TRUE如果min它返回FALSE个月且目标是否满足{{1}它会返回max个月,目标不满足。 然后filter month匹配我们刚刚添加的filter max(indi) group以删除m的前几个月。 最后删除临时列df1 %>% group_by(group) %>% mutate(m=if_else(indi==1, min(.[.$indi==1,'month']), max(.[.$indi==0,'month']))) %>% filter(month==m, indi==max(indi)) %>% select(-m)

.simple-text h1,
.simple-text h2,
.simple-text h3 {
  color: red;
}