平均每个国家的选定年份

时间:2018-03-28 16:41:42

标签: r aggregate

我想通过重复下一个国家/地区的步骤来汇总一些选定年份并推动每个国家/地区的平均值(平均值)。

这是我的数据

country year    lgaspcar    lincomep
AUSTRIA 1960    4.173244195 -6.474277179
AUSTRIA 1961    4.1009891049    -6.426005835
AUSTRIA 1965    4.033983285 -6.294667914
AUSTRIA 1966    4.0475365589    -6.252545451
AUSTRIA 1967    4.0529106939    -6.234580709
AUSTRIA 1968    4.045507048 -6.206894403
BELGIUM 1960    4.16401597  -6.215091247
BELGIUM 1961    4.124355641 -6.176842928
BELGIUM 1962    4.075961692 -6.12963802
BELGIUM 1963    4.001266072 -6.094018799
BELGIUM 1964    3.994375414 -6.036461168
BELGIUM 1965    3.9515307039    -6.00725184
BELGIUM 1966    3.8205378359    -5.994108428
BELGIUM 1967    3.9068782151    -5.964811815
BELGIUM 1968    3.8286653779    -5.924692959
CANADA  1960    4.8552384411    -5.889713473
CANADA  1961    4.8265553731    -5.884343618
CANADA  1962    4.8505325093    -5.844552303
CANADA  1963    4.8380800488    -5.792351665
CANADA  1964    4.8397604783    -5.760063369
CANADA  1965    4.850827846 -5.722821552
CANADA  1966    4.871024855 -5.671784027
CANADA  1967    4.8524989572    -5.608481132
CANADA  1968    4.868782423 -5.573924431

我想要的结果

country group     lgaspcar    lincomep
AUSTRIA   1   (1960+1961)/2  (1960+1961)/2  
AUSTRIA   2   (1962+1963)/2  (1962+1963)/2
.
.
.
BELGIUM   1   (1960+1961)/2  (1960+1961)/2
BELGIUM   1   (1960+1961)/2  (1960+1961)/2

如果您注意到,每个国家/地区的年份和长度相同。

我尝试了以下代码

aggregate(Gasoline[, 3:4], 
          list(Gasoline$country, 
               group=sample("1960:1962", "1963:1965", "1966:1978",54,rep=T)), 
          mean)

但我得到的结果是每个国家所有年份的汇总和平均值。

提前谢谢大家

2 个答案:

答案 0 :(得分:1)

我认为你正在寻找类似的东西......

1-为分组创建新变量

 Gasoline$groups <- cut(Gasoline$year, 
                           breaks = c(1960, 1962, 1965, 1968), 
                           include.lowest=TRUE)  

2-获取由lgaspcarlincomep汇总的countrygroups的平均值

  out <- aggregate(cbind(lgaspcar, lincomep)~country+groups, 
                     data=Gasoline, 
                     FUN=mean) 

3- country

排序的最终输出
  out[order(out$country), ]  
      country      groups lgaspcar  lincomep
    1 AUSTRIA [1960,1962] 4.137117 -6.450142
    4 AUSTRIA (1962,1965] 4.033983 -6.294668
    7 AUSTRIA (1965,1968] 4.048651 -6.231340
    2 BELGIUM [1960,1962] 4.121444 -6.173857
    5 BELGIUM (1962,1965] 3.982391 -6.045911
    8 BELGIUM (1965,1968] 3.852027 -5.961204
    3  CANADA [1960,1962] 4.844109 -5.872870
    6  CANADA (1962,1965] 4.842889 -5.758412
    9  CANADA (1965,1968] 4.864102 -5.618063

输入数据:

Gasoline <- read.table(text="country year    lgaspcar    lincomep
AUSTRIA 1960    4.173244195 -6.474277179
                 AUSTRIA 1961    4.1009891049    -6.426005835
                 AUSTRIA 1965    4.033983285 -6.294667914
                 AUSTRIA 1966    4.0475365589    -6.252545451
                 AUSTRIA 1967    4.0529106939    -6.234580709
                 AUSTRIA 1968    4.045507048 -6.206894403
                 BELGIUM 1960    4.16401597  -6.215091247
                 BELGIUM 1961    4.124355641 -6.176842928
                 BELGIUM 1962    4.075961692 -6.12963802
                 BELGIUM 1963    4.001266072 -6.094018799
                 BELGIUM 1964    3.994375414 -6.036461168
                 BELGIUM 1965    3.9515307039    -6.00725184
                 BELGIUM 1966    3.8205378359    -5.994108428
                 BELGIUM 1967    3.9068782151    -5.964811815
                 BELGIUM 1968    3.8286653779    -5.924692959
                 CANADA  1960    4.8552384411    -5.889713473
                 CANADA  1961    4.8265553731    -5.884343618
                 CANADA  1962    4.8505325093    -5.844552303
                 CANADA  1963    4.8380800488    -5.792351665
                 CANADA  1964    4.8397604783    -5.760063369
                 CANADA  1965    4.850827846 -5.722821552
                 CANADA  1966    4.871024855 -5.671784027
                 CANADA  1967    4.8524989572    -5.608481132
                 CANADA  1968    4.868782423 -5.573924431", header=TRUE)

答案 1 :(得分:0)

您可以为年份组合创建新的因子变量。这是一种方法。

# load in data
library(data.table)
fread('country year    lgaspcar    lincomep
AUSTRIA 1960    4.173244195 -6.474277179
AUSTRIA 1961    4.1009891049    -6.426005835
AUSTRIA 1965    4.033983285 -6.294667914
AUSTRIA 1966    4.0475365589    -6.252545451
AUSTRIA 1967    4.0529106939    -6.234580709
AUSTRIA 1968    4.045507048 -6.206894403
BELGIUM 1960    4.16401597  -6.215091247
BELGIUM 1961    4.124355641 -6.176842928
BELGIUM 1962    4.075961692 -6.12963802
BELGIUM 1963    4.001266072 -6.094018799
BELGIUM 1964    3.994375414 -6.036461168
BELGIUM 1965    3.9515307039    -6.00725184
BELGIUM 1966    3.8205378359    -5.994108428
BELGIUM 1967    3.9068782151    -5.964811815
BELGIUM 1968    3.8286653779    -5.924692959
CANADA  1960    4.8552384411    -5.889713473
CANADA  1961    4.8265553731    -5.884343618
CANADA  1962    4.8505325093    -5.844552303
CANADA  1963    4.8380800488    -5.792351665
CANADA  1964    4.8397604783    -5.760063369
CANADA  1965    4.850827846 -5.722821552
CANADA  1966    4.871024855 -5.671784027
CANADA  1967    4.8524989572    -5.608481132
CANADA  1968    4.868782423 -5.573924431') -> d

?findInterval可用于支持年度组因子,如下所示:

factor(findInterval(d$year, c(1960, 1962, 1965, 1978), rightmost.closed=TRUE, left.open=FALSE), labels=c("1960:1962", "1963:1965", "1966:1978")) -> d$group

现在只需计算组/国家意味着:

aggregate(lgaspcar ~ country + group, data=d, FUN=mean)

  country     group lgaspcar
1 AUSTRIA 1960:1962 4.137117
2 BELGIUM 1960:1962 4.144186
3  CANADA 1960:1962 4.840897
4 BELGIUM 1963:1965 4.023868
5  CANADA 1963:1965 4.842791
6 AUSTRIA 1966:1978 4.044984
7 BELGIUM 1966:1978 3.876903
8  CANADA 1966:1978 4.860784

我应该提一下@Jilber使用的剪切函数,我的答案中提到的findIntervals几乎是相同的。两者的唯一区别是关于间隔限制的默认行为。