我有一个具有213行和2列(日期和 Article )的数据框。最终目的是通过按季度对日期进行分组来减少行数。显然,我希望 Article 列中的文本进行相应的合并。
让我们举个例子。
Date <- c("2000-01-05", "2000-02-03", "2000-03-02", "2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08", "2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19", "2000-11-02", "2000-12-14")
Article <- c("Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text")
Date <- data.frame(Date)
Article <- data.frame(Article)
df <- cbind(Date, Article)
#Dataframe
Date Article
1 2000-01-05 Long Text
2 2000-02-03 Long Text
3 2000-03-02 Long Text
4 2000-03-30 Long Text
5 2000-04-13 Long Text
6 2000-05-11 Long Text
7 2000-06-08 Long Text
8 2000-07-06 Long Text
9 2000-09-14 Long Text
10 2000-10-05 Long Text
11 2000-10-19 Long Text
12 2000-11-02 Long Text
13 2000-12-14 Long Text
我想要获得的最终输出如下:
Date Article
1 2000 Q1 Long Text, Long Text, Long Text, Long Text
2 2000 Q2 Long Text, Long Text, Long Text
3 2000 Q3 Long Text, Long Text
4 2000 Q4 Long Text, Long Text, Long Text, Long Text
从本质上讲,行已按季度和相应的文本分组在一起。
我试图环顾四周,但不幸的是我不知道该怎么做。
有人可以帮我吗?
谢谢!
答案 0 :(得分:3)
一个dplyr
和lubridate
选项可以是:
df %>%
group_by(Date = as.character(lubridate::quarter(ymd(Date), with_year = TRUE))) %>%
summarise(Article = paste0(Article, collapse = ","))
Date Article
<chr> <chr>
1 2000.1 Long Text,Long Text,Long Text,Long Text
2 2000.2 Long Text,Long Text,Long Text
3 2000.3 Long Text,Long Text
4 2000.4 Long Text,Long Text,Long Text,Long Text
答案 1 :(得分:2)
我们可以使用as.yearqtr
中的zoo
进行总结
library(zoo)
library(data.table)
setDT(df)[, .(Article = toString(Article)),.(Date = as.yearqtr(as.IDate(Date)))]
# Date Article
#1: 2000 Q1 Long Text, Long Text, Long Text, Long Text
#2: 2000 Q2 Long Text, Long Text, Long Text
#3: 2000 Q3 Long Text, Long Text
#4: 2000 Q4 Long Text, Long Text, Long Text, Long Text
答案 2 :(得分:1)
Base R解决方案:
# Row-wise concatenate Article vec by the group of year & qtr:
aggregate(list(Article = df$Article),
by = list(Date = paste(gsub("[-].*", "", df$Date), quarters(df$Date), sep = " ")),
paste, sep = ", ")
数据:
df <- data.frame(Date = as.Date(c("2000-01-05",
"2000-02-03",
"2000-03-02",
"2000-03-30", "2000-04-13", "2000-05-11", "2000-06-08",
"2000-07-06", "2000-09-14", "2000-10-05", "2000-10-19",
"2000-11-02", "2000-12-14"),
"%Y-%m-%d"),
Article = c("Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text","Long Text"))