计算R中两个数据帧之间的因子数据

时间:2018-08-16 20:41:17

标签: r dataframe

我还没有找到解决方案,我认为这应该很简单,但是现在我想不起来了。

我有两个数据框,每月流量平均值和每年流量平均值。我需要将年度平均值除以每月平均值。

   ano mes dias  Au_TPDM  Bu_TPDM  CU_TPDM CAI_TPDM CAII_TPDM    TOTAL
1  2012 Ene   31 4288.323 620.5161 236.7419 4635.097  139.0645 6112.258
7  2012 Feb   29 3268.862 593.0000 246.3103 5191.069  147.9655 6267.286
13 2012 Mar   31 3667.903 624.7097 289.0323 5341.774  154.7419 6740.226
19 2012 Abr   30 4668.767 647.2333 281.2667 4930.433  158.3000 7236.300
25 2012 May   31 3198.581 598.9677 256.1290 5384.742  202.2581 6612.581
31 2012 Jun   30 3609.067 605.8667 280.3333 5309.500  178.7000 6795.000

 anosDB  TPDA_Au  TPDA_Bu  TPDA_CU TPDA_CAI TPDA_CAII TPDA_TOTAL
1   2012 4271.096 617.4809 255.1967 5119.454  163.5055   10426.73
2   2013 4685.079 638.5616 259.8877 5287.822  154.0110   11025.36
3   2014 4969.277 656.3918 266.8986 5407.800  177.0932   11477.46
4   2015 5184.953 541.8822 400.2137 4941.422  271.6877   11340.16
5   2016 5220.872 408.6967 541.0519 5584.492  182.4399   11937.55
6   2017 5298.852 408.7562 556.5644 6033.652  266.1644   12563.99

因此TPDM表的前12行应划分TPDA表的第一行,并创建一个新的数据框,其中应包含月度因子。 像这样:

ano mes dias FA_Au
2012 Ene 31 4271.096/4288.323
2012 Feb 29 4271.096/3268.862

(无需显示计算,仅显示结果) 我确信按年份选择数据会做到这一点,但还没有找到正确的方法。

1 个答案:

答案 0 :(得分:0)

按年份合并并查找按位置划分的列

zx8754所述,这可以通过合并年份并在基数R中划分相应的列来完成:

merged <- merge(TPDM, TPDA, by.x = "ano", by.y = "anosDB")
FA <- cbind(merged[, 1:3], merged[, 10:15]/merged[, 4:9])
# rename columns
names(FA) <- sub("TPDA_", "FA_", names(FA))
FA
   ano mes dias     FA_Au     FA_Bu     FA_CU    FA_CAI   FA_CAII FA_TOTAL
1 2012 Ene   31 0.9959828 0.9951086 1.0779532 1.1044977 1.1757530 1.705872
2 2012 Feb   29 1.3066003 1.0412831 1.0360781 0.9862042 1.1050245 1.663675
3 2012 Mar   31 1.1644517 0.9884285 0.8829349 0.9583809 1.0566337 1.546941
4 2012 Abr   30 0.9148231 0.9540314 0.9073122 1.0383376 1.0328838 1.440892
5 2012 May   31 1.3353096 1.0309085 0.9963600 0.9507334 0.8084003 1.576802
6 2012 Jun   30 1.1834349 1.0191696 0.9103332 0.9642064 0.9149720 1.534471

注意: 只要知道相应列的位置,即列号,该方法就起作用。对于给定的数据集,列以相同的方式排序。因此,只需考虑偏移量即可匹配相应的列。

按年份合并并查找按名称划分的列

如果由于某种原因事先不知道位置,我们可以通过匹配列名来找到对应的列。

为此,两个数据集都从宽格式重整为长格式。在长格式中,列名(现在称为variable)被视为数据。现在,我们可以将列名称上的月度和年度值连接起来,将年值除以相应的月度值,然后重新调整为宽格式,最后:

library(data.table)

# reshape and prepare monthly data
longM <- melt(setDT(TPDM), id.vars = 1:3)
longM[, variable := stringr::str_replace(variable, "_TPDM", "")]
longM[, mes := forcats::fct_inorder(mes)]

# reshape and prepare annual  data
longA <- melt(setDT(TPDA), id.vars = 1)
longA[, variable := stringr::str_replace(variable, "TPDA_", "")]
setnames(longA, "anosDB", "ano")

# join
long_FA <- longA[longM, on = .(ano, variable), 
                 .(ano, mes, dias, variable, FA = value/i.value)]

# reshape back to wide format
dcast(long_FA, ano + mes +dias ~ paste0("FA_", variable), value.var = "FA")
    ano mes dias     FA_Au     FA_Bu    FA_CAI   FA_CAII     FA_CU FA_TOTAL
1: 2012 Ene   31 0.9959828 0.9951086 1.1044977 1.1757530 1.0779532 1.705872
2: 2012 Feb   29 1.3066003 1.0412831 0.9862042 1.1050245 1.0360781 1.663675
3: 2012 Mar   31 1.1644517 0.9884285 0.9583809 1.0566337 0.8829349 1.546941
4: 2012 Abr   30 0.9148231 0.9540314 1.0383376 1.0328838 0.9073122 1.440892
5: 2012 May   31 1.3353096 1.0309085 0.9507334 0.8084003 0.9963600 1.576802
6: 2012 Jun   30 1.1834349 1.0191696 0.9642064 0.9149720 0.9103332 1.534471

数据

TPDM <- read.table(text = "
i   ano mes dias  Au_TPDM  Bu_TPDM  CU_TPDM CAI_TPDM CAII_TPDM    TOTAL
1  2012 Ene   31 4288.323 620.5161 236.7419 4635.097  139.0645 6112.258
7  2012 Feb   29 3268.862 593.0000 246.3103 5191.069  147.9655 6267.286
13 2012 Mar   31 3667.903 624.7097 289.0323 5341.774  154.7419 6740.226
19 2012 Abr   30 4668.767 647.2333 281.2667 4930.433  158.3000 7236.300
25 2012 May   31 3198.581 598.9677 256.1290 5384.742  202.2581 6612.581
31 2012 Jun   30 3609.067 605.8667 280.3333 5309.500  178.7000 6795.000
", header = TRUE)[, -1L]

TPDA <- read.table(text = "
i anosDB  TPDA_Au  TPDA_Bu  TPDA_CU TPDA_CAI TPDA_CAII TPDA_TOTAL
1   2012 4271.096 617.4809 255.1967 5119.454  163.5055   10426.73
2   2013 4685.079 638.5616 259.8877 5287.822  154.0110   11025.36
3   2014 4969.277 656.3918 266.8986 5407.800  177.0932   11477.46
4   2015 5184.953 541.8822 400.2137 4941.422  271.6877   11340.16
5   2016 5220.872 408.6967 541.0519 5584.492  182.4399   11937.55
6   2017 5298.852 408.7562 556.5644 6033.652  266.1644   12563.99
", header = TRUE)[, -1L]