我有一个数据框:
Col1 Col2 Col3
partner1 A 20
partner1 B 70
partner2 A 30
partner2 C 20
partner3 B 50
partner3 C 40
如何对其进行转换,使其仅通过Col1聚合,然后在新列中显示Col2的百分比:
Col1 Col3 Col4
partner1 90 A: 22.2%, B: 77.7%
partner2 50 A: 60.0%, C: 40.0%
partner3 90 B: 55.6%, C: 44.4%
谢谢!
答案 0 :(得分:3)
您需要summarize
,并正确格式化字符串,这是一个sprintf
的选项,假设Col2
在每个Col1
组中都是唯一的:
df %>%
group_by(Col1) %>%
# use %s: %.1f%% to format Col2 as string, and the percentage as float rounded to one
# decimal place %.1f and the percentage symbol %%
summarise(Col4 = toString(sprintf("%s: %.1f%%", Col2, Col3 * 100/sum(Col3))),
Col3 = sum(Col3))
# A tibble: 3 × 3
# Col1 Col4 Col3
# <fctr> <chr> <int>
#1 partner1 A: 22.2%, B: 77.8% 90
#2 partner2 A: 60.0%, C: 40.0% 50
#3 partner3 B: 55.6%, C: 44.4% 90
答案 1 :(得分:1)
使用prop.table
Col3
。
library('data.table')
df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1']
# Col1 col3 Col4
# 1: partner1 90 22.22222,77.77778
# 2: partner2 50 60,40
# 3: partner3 90 55.55556,44.44444
str(df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1'])
# Classes ‘data.table’ and 'data.frame': 3 obs. of 3 variables:
# $ Col1: chr "partner1" "partner2" "partner3"
# $ col3: int 90 50 90
# $ Col4:List of 3
# ..$ : Named num 22.2 77.8
# .. ..- attr(*, "names")= chr "A" "B"
# ..$ : Named num 60 40
# .. ..- attr(*, "names")= chr "A" "C"
# ..$ : Named num 55.6 44.4
# .. ..- attr(*, "names")= chr "B" "C"
# - attr(*, ".internal.selfref")=<externalptr>
答案 2 :(得分:1)
这是基础R的解决方案。它涉及使用基于分组变量的split
,然后汇总子组的结果。
do.call(rbind, lapply(split(df, df$Col1), function(a)
cbind(a$Col1[1], sum(a[,3]), paste(sapply(split(a, a$Col2), function(b)
paste(b$Col2[1],":",round(100*sum(b[,3])/sum(a[,3]),2),"%", sep = "")
), collapse = " "))))
# [,1] [,2] [,3]
#[1,] "partner1" "90" "A:22.22% B:77.78%"
#[2,] "partner2" "50" "A:60% C:40%"
#[3,] "partner3" "90" "B:55.56% C:44.44%"