我试图找出一种将数据分组的方法,然后根据分组行的内容创建一列。
要处理的示例df
df <- tibble::tribble(
~name, ~position, ~G,
"DJ LeMahieu", "1B", 40,
"DJ LeMahieu", "2B", 75,
"DJ LeMahieu", "3B", 52,
"Max Muncy", "1B", 65,
"Max Muncy", "2B", 70,
"Max Muncy", "3B", 35,
"Whit Merrifield", "2B", 82,
"Whit Merrifield", "OF", 61
)
然后我希望将此内容按名称级别分组。我想创建一个称为extra_position的新列。此列是由“ /”分隔的位置列中内容的串联。下面的示例输出:
output_df <- tibble::tribble(
~name, ~extra_position,
"DJ LeMahieu", "1B/2B/3B",
"Max Muncy", "1B/2B/3B",
"Whit Merrifield", "2B/OF"
)
如果可能的话,我想留在tidyverse
之内。另外,我很好奇您是否也可以控制串联数据的顺序。例如,您可以使DJ LeMahieu的extra_position
内容显示为:"3B/2B/1B"
吗?
答案 0 :(得分:1)
我们可以通过paste
将元素分为单个字符串来按“名称”,str_c
或(collapse
)对“位置”列进行分组
library(dplyr)
library(stringr)
df %>%
group_by(name) %>%
summarise(extra_position = str_c(position, collapse="/"))
如果我们需要rev
修改订单
df %>%
group_by(name) %>%
summarise(position = str_c(rev(position), collapse="/"))
或者如果它基于值
df %>%
group_by(name) %>%
summarise(position = str_c(gtools::mixedsort(position,
decreasing = TRUE), collapse="/"))
或与data.table
library(data.table)
setDT(df)[, .(extra_position = paste(position, collapse="/")), .(name)]
在base R
中,使用aggregate
aggregate(position ~ name, df, paste, collapse="/")