我有下表:
Name Date Score
John 11-01-02 40
John 11-01-03 47
John 11-01-04 41
John 11-01-05 35
John 11-01-06 52
John 11-01-07 47
John 11-01-08 45
John 11-01-09 43
John 11-01-10 40
Adam 11-01-02 41
Adam 11-01-03 41
Adam 11-01-04 49
Adam 11-01-05 40
Adam 11-01-06 40
我只想跟踪以下事件:对于每个学生,记录学生的时间和次数1)分数增加5或更多,然后分数减少5或更多或2 )分数减少5或更多,然后分数增加5或更多。
我制作了下表来帮助完成上述任务:每个学生的分数差异表。
Name Date Difference
John 11-01-03 7
John 11-01-04 -6
John 11-01-05 -6
John 11-01-06 17
John 11-01-07 -5
John 11-01-08 -2
John 11-01-09 -2
John 11-01-10 -3
Adam 11-01-04 8
Adam 11-01-05 -9
Adam 11-01-06 0
例如,在11-01-03,John的得分从11月1日的40分上升至47,所以47-40 = 7的差异。
我想将下表作为输出:
跟踪名称,事件日期
的人Name Dates for Events
John 11-01-03
John 11-01-05
John 11-01-06
Adam 11-01-04
11月1日至11日,John的分数变化为7,接着是-6,因此John经历了我所描述的事件。其他日期也包括在内。
在R中有一种简单的方法吗?任何帮助将不胜感激。
答案 0 :(得分:1)
使用dplyr
的一个选项可以是:
data %>% group_by(Name) %>%
mutate(diff = lead(Score) - Score,
score_increase_5 = ifelse(diff >= 5, TRUE, FALSE),
score_decrease_5 = ifelse(diff <= -5, TRUE, FALSE)) %>%
filter(!is.na(diff)) %>%
mutate(event = ((score_decrease_5 & lag(score_increase_5)) |
(score_increase_5 & lag(score_decrease_5)))) %>%
filter(event) %>%
select(Name, Date)
答案 1 :(得分:1)
这个想法是创建两列与前一行的差异,以及与以下行的差异。然后,您可以选择带有条件的子data.frame。
以下是data.table
的解决方案library(data.table)
plouf <- read.table(text = "
Name Date Score
John 11-01-02 40
John 11-01-03 47
John 11-01-04 41
John 11-01-05 35
John 11-01-06 52
John 11-01-07 47
John 11-01-08 45
John 11-01-09 43
John 11-01-10 40
Adam 11-01-02 41
Adam 11-01-03 41
Adam 11-01-04 49
Adam 11-01-05 40
Adam 11-01-06 40",header = T)
plouf <- setDT(plouf)
plouf[,Score:= as.numeric(Score)]
plouf[,diffprev := c(NA,diff(Score)), by = Name]
plouf[,difffol :=c(Score[2:.N]-Score[1:(.N-1)],NA),by = Name]
然后进行选择
plouf[(diffprev >= 5 & difffol <= -5) |(diffprev <= -5 & difffol >= 5),.(Name,Date)]
给出
> plouf[(diffprev >= 5 & difffol <= -5) |(diffprev <= -5 & difffol >= 5)]
Name Date Score diffprev difffol
1: John 11-01-03 47 7 -6
2: John 11-01-05 35 -6 17
3: John 11-01-06 52 17 -5
4: Adam 11-01-04 49 8 -9