我有一个数据集all_transcripts
,其中有一个列ConvID
和一个列name
:
>all_transcripts
ConvID Name
5 Guest
5 Guest
5 Agent
5 Guest
5 Agent
6 Reception
6 Guest
6 Agent
6 Guest
6 Guest
7 Reception
7 Reception
7 Guest
7 Guest
7 Reception
8 Reception
8 Guest
8 Agent
我想获得每个ConvID的唯一名称
我想要的输出如下:
5 ['Guest','Agent']
6 ['Reception','Guest','Agent']
7 ['Reception','Guest']
8 ['Reception','Guest','Agent']
为此,我尝试了如下的聚合函数:
aggregate(interactionId~name, all_transcripts, FUN= 'unique')
但这不起作用。如何更改代码,以获得所需的输出?
答案 0 :(得分:1)
tidyverse
解决方案。此处的区别在于,嵌套提供了一个列表列,而不是字符向量列。根据您的需求,这可能会更好,也可能不会更好。
library(tidyverse, warn.conflicts = FALSE)
all_transcripts %>%
nest(-ConvID) %>%
mutate(unique_names = map(data, ~ unique(.[, "Name", drop = TRUE]))) %>%
select(-data)
#> ConvID unique_names
#> 1 5 Guest, Agent
#> 2 6 Reception, Guest, Agent
#> 3 7 Reception, Guest
#> 4 8 Reception, Guest, Agent
data.table
解决方案library(data.table)
setDT(all_transcripts)
all_transcripts[, .(unique_names = list(unique(Name))) , by = ConvID]
#> ConvID unique_names
#> 1: 5 Guest,Agent
#> 2: 6 Reception,Guest,Agent
#> 3: 7 Reception,Guest
#> 4: 8 Reception,Guest,Agent
all_transcripts <- structure(list(ConvID = c(5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L), Name = c("Guest", "Guest",
"Agent", "Guest", "Agent", "Reception", "Guest", "Agent", "Guest",
"Guest", "Reception", "Reception", "Guest", "Guest", "Reception",
"Reception", "Guest", "Agent")), .Names = c("ConvID", "Name"), row.names = c(NA,
-18L), class = c("data.table", "data.frame"))
答案 1 :(得分:0)
提供的dplyr
解决方案适合我,但是如果您想坚持使用aggregate
,可以执行以下操作:
df <- tribble(
~ConvID, ~Name,
5, "Guest",
5, "Guest",
5, "Agent",
5, "Guest",
5, "Agent",
6, "Reception",
6, "Guest",
6, "Agent",
6, "Guest",
6, "Guest",
7, "Reception",
7, "Reception",
7, "Guest",
7, "Guest",
7, "Reception",
8, "Reception",
8, "Guest",
8, "Agent"
)
unique_m <- function(x){
paste(unique(x), collapse = ", ")
}
df2 <- aggregate(Name~ConvID, df, FUN= 'unique_m')
df2
#> ConvID Name
#> 1 5 Guest, Agent
#> 2 6 Reception, Guest, Agent
#> 3 7 Reception, Guest
#> 4 8 Reception, Guest, Agent
您需要创建其他unique
函数;否则,您将在df2
中获得一个列表列。