计算R数据帧的百分比差异

时间:2016-10-20 18:17:16

标签: r

我正在试图弄清楚如何用数据框做一些基本的数学运算。

我有一个如下所示的数据框:

| Version | Total | Case   |
|---------|-------|--------|
| 1.0.1   | 110   | Case 1 |
| 1.0.2   | 111   | Case 1 |
| 1.0.3   | 114   | Case 1 |
| 1.0.4   | 114   | Case 1 |
| 1.0.5   | 113   | Case 1 |
| 1.0.1   |  53   | Case 2 |
| 1.0.2   |  53   | Case 2 |
| 1.0.3   |  56   | Case 2 |
| 1.0.4   |  57   | Case 2 |
| 1.0.5   |  55   | Case 2 |
| 1.0.1   | 110   | Case 3 |
| 1.0.2   | 111   | Case 3 |
| 1.0.3   | 113   | Case 3 |
| 1.0.4   | 114   | Case 3 |
| 1.0.5   | 113   | Case 3 |
| 1.0.1   |  52   | Case 4 |
| 1.0.2   |  53   | Case 4 |
| 1.0.3   |  56   | Case 4 |
| 1.0.4   |  57   | Case 4 |
| 1.0.5   |  55   | Case 4 |

我想计算“案例1和2”之间的“百分比差异”,然后计算每个版本的“案例3和4”。所以对于1.0.1,它会做这个数学运算:(110-53)/(.5*(110+53))

最终它最终会得到一个看起来像这样的表:

| Version | Total | Case       |
|---------|-------|------------|
| 1.0.1   | 70%   | Case 1 & 2 |
| 1.0.2   | 71%   | Case 1 & 2 |
| 1.0.3   | 68%   | Case 1 & 2 |
| 1.0.4   | 67%   | Case 1 & 2 |
| 1.0.5   | 69%   | Case 1 & 2 |
| 1.0.1   | 72%   | Case 3 & 4 |
| 1.0.2   | 71%   | Case 3 & 4 |
| 1.0.3   | 67%   | Case 3 & 4 |
| 1.0.4   | 67%   | Case 3 & 4 |
| 1.0.5   | 69%   | Case 3 & 4 |

编辑:这是第一个使用的表的工作示例。

Version <- c('1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5', '1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5', '1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5', '1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5')
Total <- c(110, 111, 114, 114, 113, 53, 53, 56, 57, 55, 110, 111, 113, 114, 113, 52, 53, 56, 57, 55)
Case <- c('Case 1', 'Case 1', 'Case 1', 'Case 1', 'Case 1', 'Case 2', 'Case 2', 'Case 2', 'Case 2', 'Case 2', 'Case 3', 'Case 3', 'Case 3', 'Case 3', 'Case 3', 'Case 4', 'Case 4', 'Case 4', 'Case 4', 'Case 4')
df <- data.frame(Version, Total, Case)

2 个答案:

答案 0 :(得分:3)

您可以使用library (data.table)

setDT(df)
ans = df[, .(`case 1 & 2` = 200*(.SD[Case=="Case 1", Total] - .SD[Case=="Case 2", Total]) / (.SD[Case=="Case 1", Total] + .SD[Case=="Case 2", Total]),
             `case 3 & 4` = 200*(.SD[Case=="Case 3", Total] - .SD[Case=="Case 4", Total]) / (.SD[Case=="Case 1", Total] + .SD[Case=="Case 2", Total])
       ), by=Version]
#    Version case 1 & 2 case 3 & 4
# 1:   1.0.1   69.93865   71.16564
# 2:   1.0.2   70.73171   70.73171
# 3:   1.0.3   68.23529   67.05882
# 4:   1.0.4   66.66667   66.66667
# 5:   1.0.5   69.04762   69.04762

如果您需要长格式,可以使用melt

melt(ans, id="Version")
#    Version   variable    value
# 1:   1.0.1 case 1 & 2 69.93865
# 2:   1.0.2 case 1 & 2 70.73171
# 3:   1.0.3 case 1 & 2 68.23529
# 4:   1.0.4 case 1 & 2 66.66667
# 5:   1.0.5 case 1 & 2 69.04762
# 6:   1.0.1 case 3 & 4 71.16564
# 7:   1.0.2 case 3 & 4 70.73171
# 8:   1.0.3 case 3 & 4 67.05882
# 9:   1.0.4 case 3 & 4 66.66667
#10:   1.0.5 case 3 & 4 69.04762

另外一条建议:我建议不要在列名中使用空格或特殊字符。虽然你可以通过在名称周围使用反引号来逃避它,但它可能会导致问题。最好将列调用为case_a_b

答案 1 :(得分:2)

Another solution using data.table with dcast:

library(data.table)
dt <- fread(" Version | Total | Case  
             1.0.1   | 110   | Case 1 
             1.0.2   | 111   | Case 1 
             1.0.3   | 114   | Case 1 
             1.0.4   | 114   | Case 1 
             1.0.5   | 113   | Case 1 
             1.0.1   |  53   | Case 2 
             1.0.2   |  53   | Case 2 
             1.0.3   |  56   | Case 2 
             1.0.4   |  57   | Case 2 
             1.0.5   |  55   | Case 2 
             1.0.1   | 110   | Case 3 
             1.0.2   | 111   | Case 3 
             1.0.3   | 113   | Case 3 
             1.0.4   | 114   | Case 3 
             1.0.5   | 113   | Case 3 
             1.0.1   |  52   | Case 4 
             1.0.2   |  53   | Case 4 
             1.0.3   |  56   | Case 4 
             1.0.4   |  57   | Case 4 
             1.0.5   |  55   | Case 4 ")

dcast(dt, Version ~ Case, value.var = "Total")[,
        .(Version, Case_1_2 = (`Case 1`-`Case 2`)/(.5*(`Case 1`+`Case 2`)),
          Case_3_4 = (`Case 3`-`Case 4`)/(.5*(`Case 3`+`Case 4`)))]

   Version  Case_1_2  Case_3_4
1:   1.0.1 0.6993865 0.7160494
2:   1.0.2 0.7073171 0.7073171
3:   1.0.3 0.6823529 0.6745562
4:   1.0.4 0.6666667 0.6666667
5:   1.0.5 0.6904762 0.6904762