根据R中的滞后观察值创建变量

时间:2017-07-18 00:59:32

标签: r

我正在尝试创建一个新变量,如果事件发生,那么我想基于1时间内的时间变量回顾所有先前的事件。我在下面有一些示例数据。我很失落,并且不知道从哪里开始。

event<-c("Dribble","Pass","Dribble","Bad Shot","Shot Miss","Rebound","Pass","Pump Fake","Good Shot","Shot Miss")
time<-c(1,2,3,4,5,6,6.5,6.9,6.92,6.95)
player_id<-c(1,1,2,2,2,1,1,2,2,2)
pass_to_shot<-c("","Pass to Shot","","","","","Pass to Shot","","","")
test_data<-data.frame(player_id,event,time,pass_to_shot)

player_id   event    time   pass_to_short
    1      Dribble     1    NA     
    1      Pass        2    Pass to Shot
    2      Dribble     3    NA
    2      Bad Shot    4    NA
    2      Shot Miss   5    NA
    1      Rebound     6    NA
    1      Pass       6.5   Pass to Shot
    2      Pump Fake  6.9   NA
    2      Good Shot  6.92  NA

我希望它看起来像这样:

player_id   event    time   pass_to_short   chance_create
    1      Dribble     1    NA     
    1      Pass        2    Pass to Shot
    2      Dribble     3    NA
    2      Bad Shot    4    NA
    2      Shot Miss   5    NA
    1      Rebound     6    NA
    1      Pass       6.5   Pass to Shot         1
    2      Pump Fake  6.9   NA
    2      Good Shot  6.92  NA

我还没有真正了解如何在R数据集中引用过去的观察结果。基本上如果事件==&#34; Pass&#34;还有一个好的镜头&#34;事件在接下来的1秒(单位时间)然后我希望chance_create等于1.任何帮助都会很棒,谢谢!

3 个答案:

答案 0 :(得分:0)

你可以dplyr

library(dplyr)
test_data %>% mutate(event_of_interest = ifelse(event == "Pass" | event == "GoodShot",1,0),
                 time_diff = c(diff(-time),NA), 
                 chance_create = ifelse(abs(time_diff) < 1 & event_of_interest == 1,1,0))%>%
                 select(-event_of_interest,-time_diff)

输出:

          player_id     event time pass_to_shot chance_create
       1          1   Dribble 1.00                          0
       2          1      Pass 2.00 Pass to Shot             0
       3          2   Dribble 3.00                          0
       4          2  Bad Shot 4.00                          0
       5          2 Shot Miss 5.00                          0
       6          1   Rebound 6.00                          0
       7          1      Pass 6.50 Pass to Shot             1
       8          2 Pump Fake 6.90                          0
       9          2 Good Shot 6.92                          0
       10         2 Shot Miss 6.95                          0

虽然我不能100%确定我的代码是否健壮,即,我不确定它是否总是会给出所需的结果。

答案 1 :(得分:0)

这是另一个可能更强大的解决方案,但很难用当前数据来判断:

library(dplyr)
test_data %>% 
  filter(event %in% c("Pass", "Good Shot")) %>% 
  arrange(time, event) %>% 
  mutate(chance_create = ifelse((time - lead(time)) < 1 & lead(event) == "Good Shot", 1, NA)) %>% 
  select(player_id, chance_create, time) %>% 
  left_join(test_data, ., by = c("time", "player_id"))

答案 2 :(得分:0)

z1 <- test_data %>% filter(event == "Pass" | event == "Good Shot") %>%
  mutate(time_diff = c(diff(time), NA),
         chance_create = ifelse(event == "Pass" & lead(event) == "Good Shot" & time_diff <= 1, 1, 0)) %>%
  select(-time_diff)

output <- merge(test_data, z1, by = c("player_id", "event", "time", "pass_to_shot"), all.x = T) %>%
  arrange(time)
output$chance_create[is.na(output$chance_create)] <- 0
output

   player_id     event time pass_to_shot chance_create
           1   Dribble 1.00                          0
           1      Pass 2.00 Pass to Shot             0
           2   Dribble 3.00                          0
           2  Bad Shot 4.00                          0
           2 Shot Miss 5.00                          0
           1   Rebound 6.00                          0
           1      Pass 6.50 Pass to Shot             1
           2 Pump Fake 6.90                          0
           2 Good Shot 6.92                          0
           2 Shot Miss 6.95                          0