我有以下数据框:
df <- data.frame(
ID = c(1,1,1,1,1,1,2,2,2,2,2,2),
group = c("S_1","G_1","G_2","G_3","M_1","M_2","G_1","G_2","S_1","S_2","M_1","M_2"),
CODE = c(0,1,0,0,1,1,0,1,0,0,1,1)
)
ID group CODE 1 1 S_1 0 2 1 G_1 1 3 1 G_2 0 4 1 G_3 0 5 1 M_1 1 6 1 M_2 1 7 2 G_1 0 8 2 G_2 1 9 2 S_1 0 10 2 S_2 0 11 2 M_1 1 12 2 M_2 1
我想总结一下CODE列,以便对于每个ID,我最后一行:
ID CODE 1 1 100,11,0 2 2 01,11,00
对于ID==1
,我要粘贴G_1,G_2,G_3
,但不要使用定界符(按数字顺序)。 M_1
,M_2
和S_1
都一样。最后,我想将摘要的G, M, and S
添加到一行中,并用逗号分隔(按字母顺序)。
我可能会删除数字,并为第一步做group_by(group) %>% summarise(CODE=paste(CODE, collapse=""))
。尽管我希望最后一个字符串按字母顺序排列。
答案 0 :(得分:1)
我们可以使用$_SESSION['cart']
根据定界符(tidyr::separate
)在不同列的group
中获取数据,然后通过_
和{{ 1}},然后按summarise
,每个ID
获得一个字符串。
group1
不使用ID
,我们可以删除ID
之后的所有内容并将其作为组使用。
library(dplyr)
df %>%
arrange(ID,group) %>%
tidyr::separate(group, into = c('group1', 'group2'), sep = "_") %>%
group_by(ID, group1) %>%
summarise(CODE = paste(CODE, collapse = "")) %>%
summarise(CODE = toString(CODE))
# A tibble: 2 x 2
# ID CODE
# <dbl> <chr>
#1 1 100, 11, 0
#2 2 01, 11, 00
答案 1 :(得分:1)
Base R解决方案:
# Order the dataframe and genericise the group vector:
ordered_df <- within(df[with(df, order(ID, group)), ], {
group <- gsub("_.*", "", group)
}
)
# Summarise the dataframe:
aggregate(CODE~ID, do.call("rbind", lapply(split(ordered_df, paste0(ordered_df$ID, ordered_df$group)),
function(x){
data.frame(ID = unique(x$ID), CODE = paste0(x$CODE, collapse = ""))
}
)
), paste, collapse = ",")