在特定时间戳前后保存记录

时间:2019-10-31 12:45:50

标签: r dplyr

具有一个提供特定时间戳记的数据帧

dframe1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L), name = c("Google", 
"Yahoo", "Amazon", "Amazon", "Google"), date = c("2008-11-01", 
"2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02")), class = "data.frame", row.names = c(NA, 
-5L))

第二个是我想在第一个数据帧之前和之后的特定时间保存信息的信息

dframe2 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("2008-11-01", "2008-11-01", 
"2008-11-04", "2008-10-31", "2008-10-31", "2008-11-02", "2008-11-02", 
"2008-11-02", "2008-11-05", "2008-11-02", "2008-11-03", "2008-10-31", 
"2008-11-01", "2008-11-01", "2008-11-02", "2008-11-02", "2008-11-03"
), text_sth = c("test", "text_sth", "text here", "another text", 
"other", "another one", "test", "text_sth", "text here", "another text", 
"other", "etc", "test", "text_sth", "text here", "another text", 
"text here")), row.names = c(NA, -17L), class = "data.frame")

怎么可能有这个输出?

id                               text_sth   name label
1                     another text other Google   before
1 another one test text_sth another text Google after
1                     another text other  Yahoo   before
1 another one test text_sth another text  Yahoo after
1                                  other Amazon   before
1                              text here Amazon after

这是我尝试过的

library(dplyr)
dframe1 %>%
   mutate(date = as.Date(date), date1 = date) %>%
   group_by(id) %>%
   tidyr::complete(date1 = seq(date1 - 1, date1 + 1, by = "1 day")) %>%
   filter(date1 != date | is.na(date)) %>%
   select(-date) %>%
   mutate(col = c("before", "after")) %>%
   rename(date = 3) %>%
   inner_join(dframe2 %>% mutate(date = as.Date(date)))

在dframe1中,有与dframe2相同的ID。我想为每个用户使用frame1日期作为每个ID,我希望每个用户在dframe1日期的前一天和之后保留他/她的活动。最后创建一个数据框,其中包含id,合并文本列,dframe1的名称以及之前和之后的标签,该标签是dframe1日期的前一天和后一天

1 个答案:

答案 0 :(得分:2)

  1. 将日期字符串转换为实际日期。
library(dplyr)

dframe1 <- mutate(dframe1, date = as.Date(date))
dframe2 <- mutate(dframe2, date = as.Date(date))
  1. dframe2中每个text_sthid组内date的折叠值。它们将始终在输出中一起出现。
df2 <- 
  dframe2 %>% 
  group_by(id, date) %>% 
  summarise(text_sth = paste(text_sth, collapse = " "))

df2
#> # A tibble: 10 x 3
#> # Groups:   id [2]
#>       id date       text_sth                              
#>    <int> <date>     <chr>                                 
#>  1     1 2008-10-31 another text other                    
#>  2     1 2008-11-01 test text_sth                         
#>  3     1 2008-11-02 another one test text_sth another text
#>  4     1 2008-11-03 other                                 
#>  5     1 2008-11-04 text here                             
#>  6     1 2008-11-05 text here                             
#>  7     2 2008-10-31 etc                                   
#>  8     2 2008-11-01 test text_sth                         
#>  9     2 2008-11-02 text here another text                
#> 10     2 2008-11-03 text here
  1. 其余所有:通过id连接,仅保留第一个df的日期与第二个df的日期之间的差为1或-1的行。根据符号,填充label变量。
left_join(dframe1, df2, by = "id") %>% 
  mutate(date_diff = as.numeric(date.y - date.x)) %>%
  filter(abs(date_diff) == 1) %>% 
  mutate(label = ifelse(date_diff == -1, "before", "after")) %>% 
  select(id, name, label, text_sth)
#>    id   name  label                               text_sth
#> 1   1 Google before                     another text other
#> 2   1 Google  after another one test text_sth another text
#> 3   1  Yahoo before                     another text other
#> 4   1  Yahoo  after another one test text_sth another text
#> 5   1 Amazon before                                  other
#> 6   1 Amazon  after                              text here
#> 7   2 Amazon before                                    etc
#> 8   2 Amazon  after                 text here another text
#> 9   2 Google before                          test text_sth
#> 10  2 Google  after                              text here