我有一个数据表,其中数据以4个不同的级别(级别0,1,2和3)给出。我想计算每个州的第3级供水如何分配。 (我已将occ_code
保留在表中,以便即使state_code
和level
相同,也是唯一的记录)
创建样本表:
library(data.table)
state_code = c(rep(1,14))
level = c(0,1,2,3,3,2,3,1,2,3,3,3,2,3)
occ_code = LETTERS[1:14]
supply = c(100,60,50,25,25,10,10,40,30,10,10,10,10,10)
DT = data.table(state_code,occ_code,level,supply)
期望的输出
perc = c(NA,NA,NA,0.5,0.5,NA,1,NA,NA,0.33,0.33,0.33,NA,1)
DT2 = data.table(DT,perc)
基本上,我想使用这些百分比投影另一个仅在第2级给出的数据。
答案 0 :(得分:3)
可能的解决方案:
DT[, rl := rleid(level), by = state_code
][level == 3, perc := supply/sum(supply), by = .(state_code, rl)
][, rl := NULL][]
给出:
> DT state_code occ_code level supply perc 1: 1 A 0 100 NA 2: 1 B 1 60 NA 3: 1 C 2 50 NA 4: 1 D 3 25 0.5000000 5: 1 E 3 25 0.5000000 6: 1 F 2 10 NA 7: 1 G 3 10 1.0000000 8: 2 H 1 40 NA 9: 2 I 2 30 NA 10: 2 J 3 10 0.3333333 11: 2 K 3 10 0.3333333 12: 2 L 3 10 0.3333333 13: 2 M 2 10 NA 14: 2 N 3 10 1.0000000
答案 1 :(得分:1)
重组数据以仅存储第3级的信息。其他信息可以通过以下方式计算:
library(data.table)
dt3 <- DT[level == 3, ]
dt3[, parent := c("2C", "2C", "2F", "2I", "2I", "2I", "2M")]
dt3[, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 D 3 25 2C 0.5000
2: 1 E 3 25 2C 0.5000
3: 1 G 3 10 2F 1.0000
4: 2 J 3 10 2I 0.3333
5: 2 K 3 10 2I 0.3333
6: 2 L 3 10 2I 0.3333
7: 2 N 3 10 2M 1.0000
分别为supply
0,1和2计算level
:
dt3[, sum(supply)]
dt3[, sum(supply), by = state_code]
dt3[, sum(supply), by = parent]
第二种方法:
DT[level == 2, parent := paste0(level, occ_code)]
DT[level > 1, parent := parent[1], by = .(cumsum(!is.na(parent)))]
DT[level == 3, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 A 0 100 NA NA
2: 1 B 1 60 NA NA
3: 1 C 2 50 2C NA
4: 1 D 3 25 2C 0.5000
5: 1 E 3 25 2C 0.5000
6: 1 F 2 10 2F NA
7: 1 G 3 10 2F 1.0000
8: 2 H 1 40 NA NA
9: 2 I 2 30 2I NA
10: 2 J 3 10 2I 0.3333
11: 2 K 3 10 2I 0.3333
12: 2 L 3 10 2I 0.3333
13: 2 M 2 10 2M NA
14: 2 N 3 10 2M 1.0000