在数据帧中添加一个条件变量,该条件变量要考虑到同一数据帧的行之间的时间间隔

时间:2019-05-24 20:29:31

标签: r dplyr tidyverse lubridate

我有一个数据框df1,该数据框以一小时的时间间隔总结了在某个地方看到动物的次数。

例如:

df1<- data.frame(DateTime=c("2016-09-27 10:00:00","2016-09-27 10:00:00","2016-09-27 11:00:00","2016-09-27 11:00:00","2016-09-27 12:00:00","2016-09-27 12:00:00","2016-09-27 13:00:00","2016-09-27 13:00:00","2016-09-27 14:00:00","2016-09-27 14:00:00","2016-09-27 15:00:00","2016-09-27 15:00:00","2016-09-27 16:00:00","2016-09-27 16:00:00","2016-09-27 17:00:00","2016-09-27 17:00:00","2016-09-27 18:00:00","2016-09-27 18:00:00"),
                 AnimalID= c(8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9),
                 Times_seen=c(6,3,0,7,0,2,0,0,7,0,2,0,5,0,2,1,0,8))

> df1
              DateTime AnimalID Times_seen
1  2016-09-27 10:00:00        8          6
2  2016-09-27 10:00:00        9          3
3  2016-09-27 11:00:00        8          0
4  2016-09-27 11:00:00        9          7
5  2016-09-27 12:00:00        8          0
6  2016-09-27 12:00:00        9          2
7  2016-09-27 13:00:00        8          0
8  2016-09-27 13:00:00        9          0
9  2016-09-27 14:00:00        8          7
10 2016-09-27 14:00:00        9          0
11 2016-09-27 15:00:00        8          2
12 2016-09-27 15:00:00        9          0
13 2016-09-27 16:00:00        8          5
14 2016-09-27 16:00:00        9          0
15 2016-09-27 17:00:00        8          2
16 2016-09-27 17:00:00        9          1
17 2016-09-27 18:00:00        8          0
18 2016-09-27 18:00:00        9          8

据此,我想在df1中添加一个新变量,该变量表示动物是否可能在此处(如果看不到,并不意味着它不在那里)。 。显然,如果Times_seen大于0,我们将Yes添加到变量df1$Presence中。但是,当Times_seen为0时,我想考虑两个选择:A)那只动物在那里,但没人看到它(然后,PresenceYes),以及B)动物不在这个地方(然后PresenceNo)。

考虑不再存在该动物的标准是:动物的Times_seen变量为0,并且在之前的两个小时内没有在该位置看到该动物。

例如,我期望得到的一个例子是:

> df1
              DateTime AnimalID Times_seen Presence
1  2016-09-27 10:00:00        8          6      Yes
2  2016-09-27 10:00:00        9          3      Yes
3  2016-09-27 11:00:00        8          0      Yes
4  2016-09-27 11:00:00        9          7      Yes
5  2016-09-27 12:00:00        8          0      Yes
6  2016-09-27 12:00:00        9          2      Yes
7  2016-09-27 13:00:00        8          0       No
8  2016-09-27 13:00:00        9          0      Yes
9  2016-09-27 14:00:00        8          7      Yes
10 2016-09-27 14:00:00        9          0      Yes
11 2016-09-27 15:00:00        8          2      Yes
12 2016-09-27 15:00:00        9          0       No
13 2016-09-27 16:00:00        8          5      Yes
14 2016-09-27 16:00:00        9          0       No
15 2016-09-27 17:00:00        8          2      Yes
16 2016-09-27 17:00:00        9          1      Yes
17 2016-09-27 18:00:00        8          0      Yes
18 2016-09-27 18:00:00        9          8      Yes

有人知道该怎么做吗?

1 个答案:

答案 0 :(得分:2)

正如akrun在他的评论之一中指出的那样,这是我发现有用的代码:

df1<- df1 %>% mutate(DateTime = ymd_hms(DateTime)) %>% 
  group_by(AnimalID) %>% 
  mutate(Presence = map_lgl(DateTime, ~ any(Times_seen[dplyr::between(DateTime, .x - hours(2), .x + hours(0))] > 0)))

> df1
# A tibble: 18 x 4
# Groups:   AnimalID [2]
   DateTime            AnimalID Times_seen Presence
   <dttm>                 <dbl>      <dbl> <lgl>   
 1 2016-09-27 10:00:00        8          6 TRUE    
 2 2016-09-27 10:00:00        9          3 TRUE    
 3 2016-09-27 11:00:00        8          0 TRUE    
 4 2016-09-27 11:00:00        9          7 TRUE    
 5 2016-09-27 12:00:00        8          0 TRUE    
 6 2016-09-27 12:00:00        9          2 TRUE    
 7 2016-09-27 13:00:00        8          0 FALSE   
 8 2016-09-27 13:00:00        9          0 TRUE    
 9 2016-09-27 14:00:00        8          7 TRUE    
10 2016-09-27 14:00:00        9          0 TRUE    
11 2016-09-27 15:00:00        8          2 TRUE    
12 2016-09-27 15:00:00        9          0 FALSE   
13 2016-09-27 16:00:00        8          5 TRUE    
14 2016-09-27 16:00:00        9          0 FALSE   
15 2016-09-27 17:00:00        8          2 TRUE    
16 2016-09-27 17:00:00        9          1 TRUE    
17 2016-09-27 18:00:00        8          0 TRUE    
18 2016-09-27 18:00:00        9          8 TRUE    

注意:该代码可让您指示在No中说df1$Presence之前和之后要考虑的小时数。