Question

我有一张大表，我按子类countsperc计算了计数（子类别名称未显示）对于每个类别（id），然后是总数列id中每个类别（sumofcounts）的观察结果，以及子类别的比例 counsperc/sumofcounts（约比例）中的总数（apppropor），需要是近似值（小数点后3位）。
问题是，类别（old_sum）的近似比例（id）的总和必须是1.000而不是0.999等。
所以，我想在列apppropor的任何子项上要求添加或减去0.001的方法，以便始终将1.000作为总和。例如，在第1行中，数字可以是0.334而不是0.333
编辑：任务的目标不是仅产生1的精确总和，它没有效用，而是产生对其他程序的输入，其将按原样考虑apppropor列（要求它将总和为1.000）每id，请参阅下面的错误消息）。

text1<-"
id    countsperc sumofcounts   apppropor     
item1          1           3       0.333     
item1          1           3       0.333     
item1          1           3       0.333     
item2          1         121       0.008     
item2        119         121       0.983     
item2          1         121       0.008     
item3          1          44       0.023    
item3          1          44       0.023     
item3         41          44       0.932     
item3          1          44       0.023     
item4          1          29       0.034     
item4          3          29       0.103      
item4          1          29       0.034   
item4         24          29       0.828"
table1<-read.table(text=text1,header=T)
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, old_sum = V1)])
table1<-merge(table1,sums)
table1

chromEvol版本：2.0。上次更新时间为2013年12月

分类群Ad_mic的计数概率不是总和   1.0 chromEvol：errorMsg.cpp：41：static void errorMsg :: reportError（const string＆amp;，int）：断言“0”失败。   中止（核心倾销）

Answer 1

如果您需要sum_of_prop在每一行中等于1，那么您的计算方式是错误的。你不添加0.333 + 0.333 + 0.333，然后强制该和为1.你加上（1/3）+（1/3）+（1/3）然后总和实际上是1.

假设没有其他列可以更改，请尝试像这样计算sum_of_prop：

n <- length(table1$id)
new_sum_of_prop <- rep(0, n)
for (i in 1:n) {
  tempitem <- table1$id[i]
  tempsum <- sum(table1$countsperc[(table1$id == tempitem)])
  new_sum_of_prop[i] <- table1$sumofcounts[i] / tempsum
}

table2 <- as.data.frame(cbind(table1, new_sum_of_prop))
table2
      id countsperc sumofcounts apppropor sum_of_prop new_sum_of_prop
1  item1          1           3     0.333       0.999               1
2  item1          1           3     0.333       0.999               1
3  item1          1           3     0.333       0.999               1
4  item2          1         121     0.008       0.999               1
5  item2        119         121     0.983       0.999               1
6  item2          1         121     0.008       0.999               1
7  item3          1          44     0.023       1.001               1
8  item3          1          44     0.023       1.001               1
9  item3         41          44     0.932       1.001               1
10 item3          1          44     0.023       1.001               1
11 item4          1          29     0.034       0.999               1
12 item4          3          29     0.103       0.999               1
13 item4          1          29     0.034       0.999               1
14 item4         24          29     0.828       0.999               1

我知道这并不是你要求的，但从长远来看，如果你不沿途削减数学角落，你的结果总会更健康。

Answer 2

我找到了办法。

table1$dif<-1-table1$old_sum
table1<-table1[order(table1$id),]
len<-rle(as.vector(table1$id))[[1]]
table1$apppropor[cumsum(len)]<-table1$apppropor[cumsum(len)]+table1$dif[cumsum(len)]
#verify
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, new_sum = V1)])
table1<-merge(table1,sums)
table1

在R中保持和（1 = 100％）的近似比例

2 个答案: