如何计算具体的百分比

时间:2017-10-13 14:29:40

标签: r data.table percentage

我有一个数据表,其中数据以4个不同的级别(级别0,1,2和3)给出。我想计算每个州的第3级供水如何分配。 (我已将occ_code保留在表中,以便即使state_codelevel相同,也是唯一的记录)

创建样本表:

library(data.table)
state_code = c(rep(1,14))
level = c(0,1,2,3,3,2,3,1,2,3,3,3,2,3)
occ_code = LETTERS[1:14]
supply = c(100,60,50,25,25,10,10,40,30,10,10,10,10,10)    
DT = data.table(state_code,occ_code,level,supply)

期望的输出

perc = c(NA,NA,NA,0.5,0.5,NA,1,NA,NA,0.33,0.33,0.33,NA,1)
DT2 = data.table(DT,perc)

基本上,我想使用这些百分比投影另一个仅在第2级给出的数据。

2 个答案:

答案 0 :(得分:3)

可能的解决方案:

DT[, rl := rleid(level), by = state_code
   ][level == 3, perc := supply/sum(supply), by = .(state_code, rl)
     ][, rl := NULL][]

给出:

> DT
    state_code occ_code level supply      perc
 1:          1        A     0    100        NA
 2:          1        B     1     60        NA
 3:          1        C     2     50        NA
 4:          1        D     3     25 0.5000000
 5:          1        E     3     25 0.5000000
 6:          1        F     2     10        NA
 7:          1        G     3     10 1.0000000
 8:          2        H     1     40        NA
 9:          2        I     2     30        NA
10:          2        J     3     10 0.3333333
11:          2        K     3     10 0.3333333
12:          2        L     3     10 0.3333333
13:          2        M     2     10        NA
14:          2        N     3     10 1.0000000

答案 1 :(得分:1)

重组数据以仅存储第3级的信息。其他信息可以通过以下方式计算:

library(data.table)
dt3 <- DT[level == 3, ]
dt3[, parent := c("2C", "2C", "2F", "2I", "2I", "2I", "2M")]
dt3[, perc := round(supply / sum(supply), 4), by = parent]

   state_code occ_code level supply parent   perc
1:          1        D     3     25     2C 0.5000
2:          1        E     3     25     2C 0.5000
3:          1        G     3     10     2F 1.0000
4:          2        J     3     10     2I 0.3333
5:          2        K     3     10     2I 0.3333
6:          2        L     3     10     2I 0.3333
7:          2        N     3     10     2M 1.0000

分别为supply 0,1和2计算level

dt3[, sum(supply)]
dt3[, sum(supply), by = state_code]
dt3[, sum(supply), by = parent]

第二种方法:

DT[level == 2, parent := paste0(level, occ_code)]
DT[level > 1, parent := parent[1], by = .(cumsum(!is.na(parent)))]
DT[level == 3, perc := round(supply / sum(supply), 4), by = parent]

    state_code occ_code level supply parent   perc
 1:          1        A     0    100     NA     NA
 2:          1        B     1     60     NA     NA
 3:          1        C     2     50     2C     NA
 4:          1        D     3     25     2C 0.5000
 5:          1        E     3     25     2C 0.5000
 6:          1        F     2     10     2F     NA
 7:          1        G     3     10     2F 1.0000
 8:          2        H     1     40     NA     NA
 9:          2        I     2     30     2I     NA
10:          2        J     3     10     2I 0.3333
11:          2        K     3     10     2I 0.3333
12:          2        L     3     10     2I 0.3333
13:          2        M     2     10     2M     NA
14:          2        N     3     10     2M 1.0000