具有一个提供特定时间戳记的数据帧
dframe1 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L), name = c("Google",
"Yahoo", "Amazon", "Amazon", "Google"), date = c("2008-11-01",
"2008-11-01", "2008-11-04", "2008-11-01", "2008-11-02")), class = "data.frame", row.names = c(NA,
-5L))
第二个是我想在第一个数据帧之前和之后的特定时间保存信息的信息
dframe2 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("2008-11-01", "2008-11-01",
"2008-11-04", "2008-10-31", "2008-10-31", "2008-11-02", "2008-11-02",
"2008-11-02", "2008-11-05", "2008-11-02", "2008-11-03", "2008-10-31",
"2008-11-01", "2008-11-01", "2008-11-02", "2008-11-02", "2008-11-03"
), text_sth = c("test", "text_sth", "text here", "another text",
"other", "another one", "test", "text_sth", "text here", "another text",
"other", "etc", "test", "text_sth", "text here", "another text",
"text here")), row.names = c(NA, -17L), class = "data.frame")
怎么可能有这个输出?
id text_sth name label
1 another text other Google before
1 another one test text_sth another text Google after
1 another text other Yahoo before
1 another one test text_sth another text Yahoo after
1 other Amazon before
1 text here Amazon after
这是我尝试过的
library(dplyr)
dframe1 %>%
mutate(date = as.Date(date), date1 = date) %>%
group_by(id) %>%
tidyr::complete(date1 = seq(date1 - 1, date1 + 1, by = "1 day")) %>%
filter(date1 != date | is.na(date)) %>%
select(-date) %>%
mutate(col = c("before", "after")) %>%
rename(date = 3) %>%
inner_join(dframe2 %>% mutate(date = as.Date(date)))
在dframe1中,有与dframe2相同的ID。我想为每个用户使用frame1日期作为每个ID,我希望每个用户在dframe1日期的前一天和之后保留他/她的活动。最后创建一个数据框,其中包含id,合并文本列,dframe1的名称以及之前和之后的标签,该标签是dframe1日期的前一天和后一天
答案 0 :(得分:2)
library(dplyr)
dframe1 <- mutate(dframe1, date = as.Date(date))
dframe2 <- mutate(dframe2, date = as.Date(date))
text_sth
,id
组内date
的折叠值。它们将始终在输出中一起出现。 df2 <-
dframe2 %>%
group_by(id, date) %>%
summarise(text_sth = paste(text_sth, collapse = " "))
df2
#> # A tibble: 10 x 3
#> # Groups: id [2]
#> id date text_sth
#> <int> <date> <chr>
#> 1 1 2008-10-31 another text other
#> 2 1 2008-11-01 test text_sth
#> 3 1 2008-11-02 another one test text_sth another text
#> 4 1 2008-11-03 other
#> 5 1 2008-11-04 text here
#> 6 1 2008-11-05 text here
#> 7 2 2008-10-31 etc
#> 8 2 2008-11-01 test text_sth
#> 9 2 2008-11-02 text here another text
#> 10 2 2008-11-03 text here
id
连接,仅保留第一个df的日期与第二个df的日期之间的差为1或-1的行。根据符号,填充label
变量。left_join(dframe1, df2, by = "id") %>%
mutate(date_diff = as.numeric(date.y - date.x)) %>%
filter(abs(date_diff) == 1) %>%
mutate(label = ifelse(date_diff == -1, "before", "after")) %>%
select(id, name, label, text_sth)
#> id name label text_sth
#> 1 1 Google before another text other
#> 2 1 Google after another one test text_sth another text
#> 3 1 Yahoo before another text other
#> 4 1 Yahoo after another one test text_sth another text
#> 5 1 Amazon before other
#> 6 1 Amazon after text here
#> 7 2 Amazon before etc
#> 8 2 Amazon after text here another text
#> 9 2 Google before test text_sth
#> 10 2 Google after text here