根据来自多个列的条件对数据进行分组

时间:2017-12-23 20:52:22

标签: r group-by dplyr

问题描述:

我正在尝试计算新近度,基于“年度”列中最近的值,其中目标达到指标等于1,并且如果指标列具有0作为Salesman + Year的唯一可用值键,选择那个案例中的最小年份

数据:

   Salesman_ID  Year         Yearly_Targets_Achieved_Indicator

 1     AA-5468  2012                                 1
 2     AA-5468  2013                                 0
 3     AA-5468  2014                                 0
 4     AA-5468  2015                                 0
 5     AA-5468  2016                                 1
 6     AL-3791  2012                                 1
 7     AL-3791  2013                                 1
 8     AL-3791  2014                                 0
 9     AL-3893  2015                                 0
10     AL-3893  2016                                 0

预期输出:

  Salesman_ID  Year Yearly_Targets_Achieved_Indicator
         <chr> <dbl>                             <dbl>
 1     AA-5468  2016                                 1
 2     AA-3791  2013                                 1
 9     AL-3893  2015                                 0

3 个答案:

答案 0 :(得分:0)

使用包tidyverse我建议您使用以下代码:

library(tidyverse)

Prashant_df <- data.frame(
    c("AA-5468","AA-5468","AA-5468","AA-5468","AA-5468","AL-3791","AL-3791","AL-3791","AL-3893","AL-3893"),
    c(2012,2013,2014,2015,2016,2012,2013,2014,2015,2016),
    c(1,0,0,0,1,1,1,0,0,0)
)
names(Prashant_df) <- c("Salesman_ID","Year","Yearly_Targets_Achieved_Indicator")

Prashant_df <- Prashant_df %>% 
    group_by(Salesman_ID) %>% 
    mutate(Year_target=case_when(
        Yearly_Targets_Achieved_Indicator==1 ~ max(Year),
        Yearly_Targets_Achieved_Indicator==0 ~ min(Year)
        ))

Prashant_df_collapsed <- Prashant_df %>% 
    group_by(Salesman_ID) %>% 
    summarise(Year=max(Year_target),
              Yearly_Targets_Achieved_Indicator=max(Yearly_Targets_Achieved_Indicator))

答案 1 :(得分:0)

您可以为每个销售员存储最长和最小年份,以及二进制变量的最大值。

newdf = df %>% group_by(Salesman_ID) %>% summarise(
  maximum = max(Year),
  minimum = min(Year),
  maxInd = max(Yearly_Targets_Achieved_Indicator))

从这里你可以构建你的结果变量。

答案 2 :(得分:0)

使用Base R:

  c(by(dat,dat[1],function(x)if(all(x[,3]==0)) x[1,2] else max(x[which(x[,3]==1),2])))

   AA-5468 AL-3791 AL-3893 
      2016    2013    2015 

这段代码有点乱,但会产生所需的输出:以下是解释:

首先分组salesman_id,然后针对该特定组检查所有指标是否为零,如果是,则返回第一年。否则,在指标为1

的那些中查找最新/最大年份