R中的连续事件分析

时间:2017-01-19 23:41:48

标签: r date events

我需要帮助解决这个问题。我已经检查过其他各种帖子,但我不能把它拼凑起来。我有数据,大约有100,000个运动员记录和他们参加的培训活动。我简化了数据,但这种方法适合整个数据集。

data.frame的代码:

# Fictitious data
days <- seq(as.Date("2016/01/01"), as.Date("2016/01/28"), "days")
events <- c("Run","Swim","Swim","Cycle","Rest","Gym","Swim","Run",
  "Cycle","Run","Swim","Swim","Run","Swim","Cycle","Rest","Gym",
  "Swim","Swim","Swim","Run","Swim","Run","Gym","Rest","Gym",
  "Cycle","Swim")
my.data <- data.frame(athlete = 1, days,events)
# Note - This data repeats for many participants, but I did not include more than 1

我需要标记每周至少完成3次游泳比赛的运动员 连续至少2周。

编辑:我认为这不是正确的。让我们把它变得更复杂一些。假设我们在每周运动员的第一次游泳事件中运行数周,即一组7天而非日历周,开始

更新:我还有另一个挑战,比方说我想要连续10天每5天寻找一次3次游泳活动,任何地方在数据中。

由于

2 个答案:

答案 0 :(得分:3)

你可以进行两步总结,首先计算每位运动员每周的游泳次数,然后检查是否连续几周有超过三次为运动员游泳:

library(dplyr)
library(lubridate)
my.data %>% 
      arrange(days) %>% 
      group_by(athlete, w = week(days)) %>% 
      summarise(n_swim = sum(events == "Swim")) %>% 
      group_by(athlete) %>% 
      summarise(flag = any(diff(w[n_swim >= 3]) == 1))

# A tibble: 1 x 2
#  athlete  flag
#    <dbl> <lgl>
#1       1  TRUE

更新:要设置从第一个游泳开始的一周,请使用which.max()查找第一个Swim出现的索引,然后减去当天的所有日期得到日差,然后如果你做模数(7)计算,周数将从这一天开始:

my.data %>% 
        arrange(days) %>% group_by(athlete) %>% 
        mutate(Swim = events == "Swim", 
               w = as.integer(days - days[which.max(Swim)]) %/% 7) %>%
        # the first swim day is set as zero, a modulo of 7 will give week number 
        # starting from this day            

        group_by(w, add = TRUE) %>% 
        summarise(n_swim = sum(Swim)) %>% 
        group_by(athlete) %>% 
        summarise(flag = any(diff(w[n_swim >= 3]) == 1))

# A tibble: 1 x 2
#  athlete  flag
#    <dbl> <lgl>
#1       1  TRUE

答案 1 :(得分:1)

快速而脏的代码,但检查它是否适用于您的数据集:

library(tidyverse)
library(lubridate)

df %>% 
    mutate(weeknum=week(days)) %>% 
    group_by(athlete,weeknum) %>% 
    filter(events=='Swim') %>% 
    summarise(n=n()) %>% 
    mutate(gt_3=as.numeric(n>=3),
           x=gt_3-lag(gt_3,1),
           flag=x==0) %>% 
    filter(flag==T) %>% 
    select(athlete) %>% 
    distinct()