我有一个包含3列的数据框:个人ID,行程(按ID排序)和草料(是或否):
example <- data.frame(IDs = c(rep("A",30),rep("B",30)),
timestamp = seq(c(ISOdate(2016,10,01)), by = "day", length.out = 60),
trip = c(rep("1",15),rep("2",15)),
forage = c(rep("Yes",3),rep("No",5),rep("Yes",3),rep("No",4),rep("Yes",7),rep("No",8)))
我想创建两个单独的列,其中将列出每次观察的觅食事件。在第一列中,我想为ID和行程中的觅食=“ yes”编号每个观察值(因此,个人中的每个行程将有x次觅食事件,对于个人中的下一个行程,将从“ 1”重新开始) 。该列如下所示:
example$forageEvent1 <- c(rep(1,3),rep("NA",5),rep(2,3),rep("NA",4),rep(1,7),rep("NA",8),rep(1,3),rep("NA",5),rep(2,3),rep("NA",4),rep(1,7),rep("NA",8))
第二列将仅通过ID对觅食事件进行编号:
example$forageEvent2 <- c(rep(1,3),rep("NA",5),rep(2,3),rep("NA",4),rep(3,7),rep("NA",8),rep(1,3),rep("NA",5),rep(2,3),rep("NA",4),rep(3,7),rep("NA",8))
我可以将子集/管道分解为个人,然后跳闸并尝试了ifelse(),但不知道如何编写将创建事件序列的代码。谢谢大家。
编辑:下面的代码(从注释中编辑)接近。但是,它以“ Forage0”而不是“ Forage1”开头打印。
library(dplyr)
Test_example <- example %>%
group_by(IDs) %>%
mutate(
ForagebyID = case_when(
forage == "Yes" ~ "Forage",
forage == "No" ~"NonForage"),
rleid = cumsum(ForagebyID != lag(ForagebyID, 1, default = "NA")),
ForagebyID = case_when(
ForagebyID == "Forage" ~ paste0(ForagebyID, rleid %/% 2),
TRUE ~ "NonForage"),
rleid = NULL
)
答案 0 :(得分:1)
我认为这将满足您的要求
library(dplyr)
example <- data.frame(IDs = c(rep("A",30),rep("B",30)),
timestamp = seq(c(ISOdate(2016,10,01)), by = "day", length.out = 60),
trip = c(rep("1",15),rep("2",15)),
forage = c(rep("Yes",3),rep("No",5),rep("Yes",3),rep("No",4),rep("Yes",7),rep("No",8)))
Test_example <- example %>%
arrange(IDs, timestamp) %>%
group_by(IDs, trip) %>%
mutate(forageEvent1 = case_when(forage == "No" ~ 0,
TRUE ~ cumsum(forage != lag(forage, default = 1)) %/% 2 + 1)) %>%
group_by(IDs) %>%
mutate(forageEvent2 = case_when(forage == "No" ~ 0,
TRUE ~ cumsum(forage != lag(forage, default = 1)) %/% 2 + 1))