使用历史数据[前期]创建新变量

时间:2021-07-21 12:49:38

标签: r dplyr tidyr

我想创建两列,考虑到之前的历史[t-1]来创建新列,指定当前期间重复或执行的新活动和旧活动的数量[数据结构见下文] . 例如第 5 行,算法应该将“思考”的新事件与前一时期 [读、写] 进行比较,并且由于在 t-1 中没有先前的“思考”,因此将其列为 1 [对于新的] 并且没有旧事件已在时间段 3 [第 5 行] 中使用,因此为 0。

event<- c('read', 'write', 'read', 'write', 'think', 'read', 'think', 'read')
person<- c('arun', 'arun','arun','arun','arun','john','john', 'john')
time <-  c(1, 1,2,2,3,1,2,3)
df<- data.frame(event, person,time)

    event person time  new  old

    read   arun    1    .    .
    write  arun    1    .    .
    read   arun    2    0    2
    write  arun    2    0    2
    think  arun    3    1    0
    read   john    1    .    .
    think  john    2    1    0
    read   john    3    1    0

关于如何实现这一点有什么建议吗?

1 个答案:

答案 0 :(得分:1)

event<- c('read', 'write', 'read', 'write', 'think', 'read', 'think', 'read')
person<- c('arun', 'arun','arun','arun','arun','john','john', 'john')
time <-  c(1, 1,2,2,3,1,2,3)
df<- data.frame(event, person,time)

library(tidyverse, warn.conflicts = FALSE)

df %>% 
  group_by(person, time) %>%
  summarise(new = list(event), .groups = 'drop') %>%
  group_by(person) %>%
  mutate(old = map2_int(new, lag(new), ~ sum(.x %in% .y)),
         new = map_int(new, length) - old) %>%
  mutate(across(new:old, ~ifelse(time == 1, NA, .))) %>%
  left_join( df, ., by = c('person', 'time'))

#>   event person time new old
#> 1  read   arun    1  NA  NA
#> 2 write   arun    1  NA  NA
#> 3  read   arun    2   0   2
#> 4 write   arun    2   0   2
#> 5 think   arun    3   1   0
#> 6  read   john    1  NA  NA
#> 7 think   john    2   1   0
#> 8  read   john    3   1   0

reprex package (v2.0.0) 于 2021 年 7 月 21 日创建

相关问题