根据条件连续连接文本

时间:2014-07-13 00:03:59

标签: r text merge concatenation string-concatenation

我有以下数据。

a <- structure(list(Title = c("AAADE", "BBBCF", "NBNJHB", "TTTTT", "VVVFF", 
"AASFE", "DDDFFF", "ERFRR", "AAAAAA", "ERERE"), 
Year = c("2004", "2004", "2004", "2004", "2004", "2004", "2005", "2005", "2005", "2005")),
.Names = c("Title", "Year"), row.names = c(NA, -10L), class = "data.frame")
a
    Title Year
1   AAADE 2004
2   BBBCF 2004
3  NBNJHB 2004
4   TTTTT 2004
5   VVVFF 2004
6   AASFE 2004
7  DDDFFF 2005
8   ERFRR 2005
9  AAAAAA 2005
10  ERERE 2005

我想基于同一年连接行。我正在尝试使用'tm'包函数,这些函数无法帮助我得到以下结果。

Title                                     Year      
AAADE BBBCF NBNJHB TTTTT VVVFF AASFE      2004
DDDFFF ERFRR AAAAAA ERERE                 2005

2 个答案:

答案 0 :(得分:3)

更直接的方法是使用aggregate

aggregate(Title ~ Year, a, paste, collapse = " ")
#   Year                                Title
# 1 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2 2005            DDDFFF ERFRR AAAAAA ERERE

如果列的顺序对您很重要,您可以aggregate(Title ~ Year, a, paste, collapse = " ")[names(a)]

aggregate加强,您可以查看&#34; data.table&#34;和&#34; dplyr&#34;,这两者对于更大的数据集都会更有效。

这里&#34; dplyr&#34;:

library(dplyr)
a %>% group_by(Year) %>% summarise(Title = paste(Title, collapse = " "))
# Source: local data frame [2 x 2]
# 
#   Year                                Title
# 1 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2 2005            DDDFFF ERFRR AAAAAA ERERE

这里&#34; data.table&#34;:

library(data.table)
A <- as.data.table(a)
A[, list(Title = paste(Title, collapse = " ")), by = Year]
#    Year                                Title
# 1: 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2: 2005            DDDFFF ERFRR AAAAAA ERERE

答案 1 :(得分:2)

with(a, data.frame(Title = tapply(Title, Year, paste, collapse = ' '), Year = unique(Year)))

结果:

                                Title Year
 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE 2004
            DDDFFF ERFRR AAAAAA ERERE 2005