来自多个组和条件的新列

时间:2018-02-15 23:39:20

标签: r

我有一个大型数据框,这是一个简化的例子:

df1<- data.frame(nest = c(1:12),
            plot = rep(c("a", "a", "a","b", "b", "b"), times = 2),
            year = rep(c(2015, 2016, 2017), times = 4),
            treatment = rep(c("Control", "Trap","Control","Trap","Control","Control"), times = 2))

,并提供:

 nest plot year treatment
  1    a  2015   Control
  2    a  2016      Trap
  3    a  2017   Control
  4    b  2015      Trap
  5    b  2016   Control
  6    b  2017   Control
  7    a  2015   Control
  8    a  2016      Trap
  9    a  2017   Control
 10    b  2015      Trap
 11    b  2016   Control
 12    b  2017   Control

我想根据以下内容创建一个新列prevTrap:

  • 按情节分组,如果治疗是前一年的陷阱,prevTrap = 1, 否则为0
  • 如果年= 2015
  • 将始终为零

(对于同一地块/年份组合中的多个巢穴)

期望的结果:

 nest plot year treatment  prevTrap
  1    a  2015   Control       0
  2    a  2016      Trap       0
  3    a  2017   Control       1
  4    b  2015      Trap       0
  5    b  2016   Control       1
  6    b  2017   Control       0
  7    a  2015   Control       0
  8    a  2016      Trap       0
  9    a  2017   Control       1
 10    b  2015      Trap       0
 11    b  2016   Control       1
 12    b  2017   Control       0

我尝试了以下代码的不同变体,这导致所有prevTrap值= 0

df2<- df1 %>%
group_by(plot) %>%
mutate(prevTrap = ifelse(treatment == "Trap" &
                        year == year - 1, 
                        "1", "0"))

我应该将年份视为一个因素还是数字?

2 个答案:

答案 0 :(得分:1)

找到一个不受数据帧排序影响的解决方案:

#filter to get list of plots that were TRAP 2015 
Trap2015<-filter(df1, year == 2015 & treatment == "Trap")  
Trap2015plots<-droplevels(Trap2015$plot) 
Trap2015plots  

上面显然会返回一个级别,&#34; b&#34;,但是对于更大的数据集,会生成一个列表,可以输入到下一部分代码中。我在2016年做了同样的事情(未显示)

#create prevTrap column
df2<- df1 %>%
      mutate(prevTrap = ifelse(df1$plot %in% c("b") & #2015 plots = Trap
                         as.character(year) == "2016" |
                         df1$plot %in% c("a") & #2016 plots = Trap
                         as.character(year) == "2017",
                         "1", "0"))

答案 1 :(得分:0)

这适用于您的示例数据框,但只有在您的大型数据集以相同的方式构建时才会起作用,年在组内排序,组由其他组分隔(abab ...)

我还将数据框命名为df1,以避免与df()函数混淆。

library(tidyverse)
df1 %>%
  group_by(plot) %>% 
  mutate(prevTrap = ifelse(lag(treatment) == "Trap", "1", "0")) %>%
  ungroup() %>% 
  replace_na(list("prevTrap" = 0))