基于两列查找百分比

时间:2016-03-28 20:22:41

标签: r frequency percentage

以下是原始数据框的样子:

         PLACEMENT      SIZE      COST
1        placement1     LARGE    1838128.00
58       placement1     MEDIUM   10962048.00
117      placement1     SMALL    2622851.00
175      placement1     UNKNOWN  443.00
2        placement2     LARGE    598.00
59       placement2     MEDIUM   24358.00
118      placement2     SMALL    571802.00
176      placement2     UNKNOWN  1706.00
3        placement3     LARGE    8.00
60       placement3     MEDIUM   22.00  
119      placement3     SMALL    502388.00
177      placement3     UNKNOWN  762.00

如何创建一个显示SATE按PLACEMENT百分比的列?

我希望它最终看起来像这样:

         PLACEMENT      SIZE      COST           PERCENTAGE
1        placement1     LARGE    1838128.00         11.9
58       placement1     MEDIUM   10962048.00        71.1
117      placement1     SMALL    2622851.00         17.0
175      placement1     UNKNOWN  443.00              0.0 
2        placement2     LARGE    598.00              0.1
59       placement2     MEDIUM   24358.00           4.07
118      placement2     SMALL    571802.00         95.54
176      placement2     UNKNOWN  1706.00            0.29
3        placement3     LARGE    8.00                0.0
60       placement3     MEDIUM   22.00               0.0
119      placement3     SMALL    502388.00         99.84
177      placement3     UNKNOWN  762.00             0.16 

任何帮助都会很棒,谢谢!我无法弄清楚prop.table库,即使我有一种我应该使用的感觉。

3 个答案:

答案 0 :(得分:2)

你可以使用dplyr快速完成:

library(dplyr)
df <- df %>% group_by(PLACEMENT) %>% mutate(PERCENTAGE=COST/SUM(COST))

看起来您想要的结果也是四舍五入的,如果您愿意,可以使用圆函数()来完成。

编辑如果您希望将百分比保持在1到100之间,那么您当然可以通过编写100 * COST / SUM(COST)代替,如果您喜欢这样做的话。

答案 1 :(得分:1)

假设您的数据框输入为DF,则可以执行此操作。不需要包裹。

transform(DF, PC = 100 * ave(COST, PLACEMENT, FUN = prop.table)) 

,并提供:

     PLACEMENT    SIZE     COST           PC
1   placement1   LARGE  1838128 11.917733169
58  placement1  MEDIUM 10962048 71.073811535
117 placement1   SMALL  2622851 17.005583050
175 placement1 UNKNOWN      443  0.002872246
2   placement2   LARGE      598  0.099922468
59  placement2  MEDIUM    24358  4.070086087
118 placement2   SMALL   571802 95.544928350
176 placement2 UNKNOWN     1706  0.285063095
3   placement3   LARGE        8  0.001589888
60  placement3  MEDIUM       22  0.004372193
119 placement3   SMALL   502388 99.842601057
177 placement3 UNKNOWN      762  0.151436862

注意:可重复形式的输入是:

Lines <- "PLACEMENT      SIZE      COST
1        placement1     LARGE    1838128.00
58       placement1     MEDIUM   10962048.00
117      placement1     SMALL    2622851.00
175      placement1     UNKNOWN  443.00
2        placement2     LARGE    598.00
59       placement2     MEDIUM   24358.00
118      placement2     SMALL    571802.00
176      placement2     UNKNOWN  1706.00
3        placement3     LARGE    8.00
60       placement3     MEDIUM   22.00  
119      placement3     SMALL    502388.00
177      placement3     UNKNOWN  762.00"

DF <- read.table(text = Lines, header = TRUE)

答案 2 :(得分:0)

以下是使用data.table

的选项
library(data.table)
setDT(df)[, PERCENTAGE := COST/SUM(COST) ,  by = PLACEMENT]