编辑：

Question

我有这个数据框

df <- data.frame(id = c(1,1,2), 
                 date = c("2008-08-04 05:45:07","2008-08-04 09:45:07","2008-08-04 05:45:07"), 
                 text = c("stg","another","final"))

我想进行汇总以得到此输出

data.frame(id = c(1,2), 
           date = c("2008-08-04", "2008-08-04"), 
           text = c("stg another","final"))

我使用了它，但是它返回了输入ID

aggregate(text ~ date + id, df, paste, sep = " ")

我该如何解决？

Answer 1

以下是使用dplyr的建议：

library(dplyr)

df %>% 
  arrange(date) %>%
  mutate(date_day = format(as.Date(date, "%Y-%m-%d %H:%M:%S"), "%Y-%m-%d")) %>% 
  group_by(id, date_day) %>% 
  summarise(text = paste(text, collapse=" "))

哪个返回：

# A tibble: 2 x 3
# Groups:   id [2]
     id date_day   text       
  <dbl> <chr>      <chr>      
1     1 2008-08-04 stg another
2     2 2008-08-04 final

说明：

按日期时间date排序，因此最后summarise步骤中的字符串连接将以正确的顺序获得字符串
从日期时间提取date_day部分
按date_day和id分组
对于每个date_day-id组合，都使用“”（空格）作为分隔符，将text的所有元素连接起来。

编辑：

使用base R的解决方案：

aggregate(text ~ format(as.Date(date, "%Y-%m-%d %H:%M:%S"), "%Y-%m-%d") + id, df, paste, sep = " ")

Answer 2

在您最初提出的问题中，我会这样做：

library(tidyverse)

df <- data.frame(id = c(1,1,2), 
                 date = c("2008-08-04 05:45:07","2008-08-04 09:45:07","2008-08-04 05:45:07"), 
                 text = c("stg","another","final")) %>%
  mutate(date = str_sub(date, 1, 10))


aggregate(text ~ date + id, df, paste, collapse = " ")

具有相同ID和日期的汇总数据

2 个答案:

编辑：