跟踪R中特定事件的数据框

时间:2018-03-03 15:45:41

标签: r dataframe datatable

我有下表:

Name        Date       Score
John      11-01-02      40
John      11-01-03      47
John      11-01-04      41
John      11-01-05      35
John      11-01-06      52
John      11-01-07      47
John      11-01-08      45
John      11-01-09      43
John      11-01-10      40
Adam      11-01-02      41
Adam      11-01-03      41
Adam      11-01-04      49
Adam      11-01-05      40
Adam      11-01-06      40

我只想跟踪以下事件:对于每个学生,记录学生的时间和次数1)分数增加5或更多,然后分数减少5或更多或2 )分数减少5或更多,然后分数增加5或更多。

我制作了下表来帮助完成上述任务:每个学生的分数差异表。

Name        Date      Difference
John      11-01-03       7
John      11-01-04      -6
John      11-01-05      -6
John      11-01-06      17
John      11-01-07      -5
John      11-01-08      -2
John      11-01-09      -2
John      11-01-10      -3
Adam      11-01-04       8
Adam      11-01-05      -9
Adam      11-01-06       0

例如,在11-01-03,John的得分从11月1日的40分上升至47,所以47-40 = 7的差异。

我想将下表作为输出:

跟踪名称,事件日期

的人
Name        Dates for Events
John            11-01-03      
John            11-01-05
John            11-01-06
Adam            11-01-04

11月1日至11日,John的分数变化为7,接着是-6,因此John经历了我所描述的事件。其他日期也包括在内。

在R中有一种简单的方法吗?任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

使用dplyr的一个选项可以是:

data %>% group_by(Name) %>%
  mutate(diff = lead(Score) - Score,
         score_increase_5 = ifelse(diff >= 5, TRUE, FALSE),
         score_decrease_5 = ifelse(diff <= -5, TRUE, FALSE)) %>%
  filter(!is.na(diff)) %>%
  mutate(event = ((score_decrease_5 & lag(score_increase_5)) |
  (score_increase_5 & lag(score_decrease_5)))) %>%
  filter(event) %>%
  select(Name, Date)

答案 1 :(得分:1)

这个想法是创建两列与前一行的差异,以及与以下行的差异。然后,您可以选择带有条件的子data.frame。

以下是data.table

的解决方案
library(data.table)
plouf <- read.table(text = "
Name        Date       Score
John      11-01-02      40
John      11-01-03      47
John      11-01-04      41
John      11-01-05      35
John      11-01-06      52
John      11-01-07      47
John      11-01-08      45
John      11-01-09      43
John      11-01-10      40
Adam      11-01-02      41
Adam      11-01-03      41
Adam      11-01-04      49
Adam      11-01-05      40
Adam      11-01-06      40",header = T)
plouf <- setDT(plouf)
plouf[,Score:= as.numeric(Score)]
plouf[,diffprev := c(NA,diff(Score)), by = Name]
plouf[,difffol :=c(Score[2:.N]-Score[1:(.N-1)],NA),by = Name]

然后进行选择

plouf[(diffprev >= 5 & difffol <= -5) |(diffprev <= -5 & difffol >= 5),.(Name,Date)]

给出

> plouf[(diffprev >= 5 & difffol <= -5) |(diffprev <= -5 & difffol >= 5)]
   Name     Date Score diffprev difffol
1: John 11-01-03    47        7      -6
2: John 11-01-05    35       -6      17
3: John 11-01-06    52       17      -5
4: Adam 11-01-04    49        8      -9