获取每个值的百分比并添加到下一列(聚合后)

时间:2017-03-08 20:59:24

标签: r dplyr

我有一个数据框:

    Col1       Col2       Col3            
partner1          A         20      
partner1          B         70
partner2          A         30
partner2          C         20
partner3          B         50
partner3          C         40

如何对其进行转换,使其仅通过Col1聚合,然后在新列中显示Col2的百分比:

    Col1       Col3                    Col4            
partner1         90      A: 22.2%, B: 77.7% 
partner2         50      A: 60.0%, C: 40.0%
partner3         90      B: 55.6%, C: 44.4%

谢谢!

3 个答案:

答案 0 :(得分:3)

您需要summarize,并正确格式化字符串,这是一个sprintf的选项,假设Col2在每个Col1组中都是唯一的:

df %>% 
      group_by(Col1) %>% 

      # use %s: %.1f%% to format Col2 as string, and the percentage as float rounded to one 
      # decimal place %.1f and the percentage symbol %%
      summarise(Col4 = toString(sprintf("%s: %.1f%%", Col2, Col3 * 100/sum(Col3))), 
                Col3 = sum(Col3))

# A tibble: 3 × 3
#      Col1               Col4  Col3
#    <fctr>              <chr> <int>
#1 partner1 A: 22.2%, B: 77.8%    90
#2 partner2 A: 60.0%, C: 40.0%    50
#3 partner3 B: 55.6%, C: 44.4%    90

答案 1 :(得分:1)

使用prop.table Col3

library('data.table')
df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1']
#       Col1 col3              Col4
# 1: partner1   90 22.22222,77.77778
# 2: partner2   50             60,40
# 3: partner3   90 55.55556,44.44444

str(df1[, .(col3 = sum(Col3), Col4 = list( setNames(prop.table(Col3)*100, unique(Col2)))), by = 'Col1'])
# Classes ‘data.table’ and 'data.frame':    3 obs. of  3 variables:
#   $ Col1: chr  "partner1" "partner2" "partner3"
# $ col3: int  90 50 90
# $ Col4:List of 3
# ..$ : Named num  22.2 77.8
# .. ..- attr(*, "names")= chr  "A" "B"
# ..$ : Named num  60 40
# .. ..- attr(*, "names")= chr  "A" "C"
# ..$ : Named num  55.6 44.4
# .. ..- attr(*, "names")= chr  "B" "C"
# - attr(*, ".internal.selfref")=<externalptr> 

答案 2 :(得分:1)

这是基础R的解决方案。它涉及使用基于分组变量的split,然后汇总子组的结果。

do.call(rbind, lapply(split(df, df$Col1), function(a)
    cbind(a$Col1[1], sum(a[,3]), paste(sapply(split(a, a$Col2), function(b)
        paste(b$Col2[1],":",round(100*sum(b[,3])/sum(a[,3]),2),"%", sep = "")
        ), collapse = " "))))
#            [,1]       [,2] [,3]               
#[1,] "partner1" "90" "A:22.22% B:77.78%"
#[2,] "partner2" "50" "A:60% C:40%"      
#[3,] "partner3" "90" "B:55.56% C:44.44%"