我正在试图弄清楚如何用数据框做一些基本的数学运算。
我有一个如下所示的数据框:
| Version | Total | Case |
|---------|-------|--------|
| 1.0.1 | 110 | Case 1 |
| 1.0.2 | 111 | Case 1 |
| 1.0.3 | 114 | Case 1 |
| 1.0.4 | 114 | Case 1 |
| 1.0.5 | 113 | Case 1 |
| 1.0.1 | 53 | Case 2 |
| 1.0.2 | 53 | Case 2 |
| 1.0.3 | 56 | Case 2 |
| 1.0.4 | 57 | Case 2 |
| 1.0.5 | 55 | Case 2 |
| 1.0.1 | 110 | Case 3 |
| 1.0.2 | 111 | Case 3 |
| 1.0.3 | 113 | Case 3 |
| 1.0.4 | 114 | Case 3 |
| 1.0.5 | 113 | Case 3 |
| 1.0.1 | 52 | Case 4 |
| 1.0.2 | 53 | Case 4 |
| 1.0.3 | 56 | Case 4 |
| 1.0.4 | 57 | Case 4 |
| 1.0.5 | 55 | Case 4 |
我想计算“案例1和2”之间的“百分比差异”,然后计算每个版本的“案例3和4”。所以对于1.0.1,它会做这个数学运算:(110-53)/(.5*(110+53))
最终它最终会得到一个看起来像这样的表:
| Version | Total | Case |
|---------|-------|------------|
| 1.0.1 | 70% | Case 1 & 2 |
| 1.0.2 | 71% | Case 1 & 2 |
| 1.0.3 | 68% | Case 1 & 2 |
| 1.0.4 | 67% | Case 1 & 2 |
| 1.0.5 | 69% | Case 1 & 2 |
| 1.0.1 | 72% | Case 3 & 4 |
| 1.0.2 | 71% | Case 3 & 4 |
| 1.0.3 | 67% | Case 3 & 4 |
| 1.0.4 | 67% | Case 3 & 4 |
| 1.0.5 | 69% | Case 3 & 4 |
编辑:这是第一个使用的表的工作示例。
Version <- c('1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5', '1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5', '1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5', '1.0.1', '1.0.2', '1.0.3', '1.0.4', '1.0.5')
Total <- c(110, 111, 114, 114, 113, 53, 53, 56, 57, 55, 110, 111, 113, 114, 113, 52, 53, 56, 57, 55)
Case <- c('Case 1', 'Case 1', 'Case 1', 'Case 1', 'Case 1', 'Case 2', 'Case 2', 'Case 2', 'Case 2', 'Case 2', 'Case 3', 'Case 3', 'Case 3', 'Case 3', 'Case 3', 'Case 4', 'Case 4', 'Case 4', 'Case 4', 'Case 4')
df <- data.frame(Version, Total, Case)
答案 0 :(得分:3)
您可以使用library (data.table)
setDT(df)
ans = df[, .(`case 1 & 2` = 200*(.SD[Case=="Case 1", Total] - .SD[Case=="Case 2", Total]) / (.SD[Case=="Case 1", Total] + .SD[Case=="Case 2", Total]),
`case 3 & 4` = 200*(.SD[Case=="Case 3", Total] - .SD[Case=="Case 4", Total]) / (.SD[Case=="Case 1", Total] + .SD[Case=="Case 2", Total])
), by=Version]
# Version case 1 & 2 case 3 & 4
# 1: 1.0.1 69.93865 71.16564
# 2: 1.0.2 70.73171 70.73171
# 3: 1.0.3 68.23529 67.05882
# 4: 1.0.4 66.66667 66.66667
# 5: 1.0.5 69.04762 69.04762
如果您需要长格式,可以使用melt
melt(ans, id="Version")
# Version variable value
# 1: 1.0.1 case 1 & 2 69.93865
# 2: 1.0.2 case 1 & 2 70.73171
# 3: 1.0.3 case 1 & 2 68.23529
# 4: 1.0.4 case 1 & 2 66.66667
# 5: 1.0.5 case 1 & 2 69.04762
# 6: 1.0.1 case 3 & 4 71.16564
# 7: 1.0.2 case 3 & 4 70.73171
# 8: 1.0.3 case 3 & 4 67.05882
# 9: 1.0.4 case 3 & 4 66.66667
#10: 1.0.5 case 3 & 4 69.04762
另外一条建议:我建议不要在列名中使用空格或特殊字符。虽然你可以通过在名称周围使用反引号来逃避它,但它可能会导致问题。最好将列调用为case_a_b
答案 1 :(得分:2)
Another solution using data.table
with dcast
:
library(data.table)
dt <- fread(" Version | Total | Case
1.0.1 | 110 | Case 1
1.0.2 | 111 | Case 1
1.0.3 | 114 | Case 1
1.0.4 | 114 | Case 1
1.0.5 | 113 | Case 1
1.0.1 | 53 | Case 2
1.0.2 | 53 | Case 2
1.0.3 | 56 | Case 2
1.0.4 | 57 | Case 2
1.0.5 | 55 | Case 2
1.0.1 | 110 | Case 3
1.0.2 | 111 | Case 3
1.0.3 | 113 | Case 3
1.0.4 | 114 | Case 3
1.0.5 | 113 | Case 3
1.0.1 | 52 | Case 4
1.0.2 | 53 | Case 4
1.0.3 | 56 | Case 4
1.0.4 | 57 | Case 4
1.0.5 | 55 | Case 4 ")
dcast(dt, Version ~ Case, value.var = "Total")[,
.(Version, Case_1_2 = (`Case 1`-`Case 2`)/(.5*(`Case 1`+`Case 2`)),
Case_3_4 = (`Case 3`-`Case 4`)/(.5*(`Case 3`+`Case 4`)))]
Version Case_1_2 Case_3_4
1: 1.0.1 0.6993865 0.7160494
2: 1.0.2 0.7073171 0.7073171
3: 1.0.3 0.6823529 0.6745562
4: 1.0.4 0.6666667 0.6666667
5: 1.0.5 0.6904762 0.6904762