我想通过重复下一个国家/地区的步骤来汇总一些选定年份并推动每个国家/地区的平均值(平均值)。
这是我的数据
country year lgaspcar lincomep
AUSTRIA 1960 4.173244195 -6.474277179
AUSTRIA 1961 4.1009891049 -6.426005835
AUSTRIA 1965 4.033983285 -6.294667914
AUSTRIA 1966 4.0475365589 -6.252545451
AUSTRIA 1967 4.0529106939 -6.234580709
AUSTRIA 1968 4.045507048 -6.206894403
BELGIUM 1960 4.16401597 -6.215091247
BELGIUM 1961 4.124355641 -6.176842928
BELGIUM 1962 4.075961692 -6.12963802
BELGIUM 1963 4.001266072 -6.094018799
BELGIUM 1964 3.994375414 -6.036461168
BELGIUM 1965 3.9515307039 -6.00725184
BELGIUM 1966 3.8205378359 -5.994108428
BELGIUM 1967 3.9068782151 -5.964811815
BELGIUM 1968 3.8286653779 -5.924692959
CANADA 1960 4.8552384411 -5.889713473
CANADA 1961 4.8265553731 -5.884343618
CANADA 1962 4.8505325093 -5.844552303
CANADA 1963 4.8380800488 -5.792351665
CANADA 1964 4.8397604783 -5.760063369
CANADA 1965 4.850827846 -5.722821552
CANADA 1966 4.871024855 -5.671784027
CANADA 1967 4.8524989572 -5.608481132
CANADA 1968 4.868782423 -5.573924431
我想要的结果
country group lgaspcar lincomep
AUSTRIA 1 (1960+1961)/2 (1960+1961)/2
AUSTRIA 2 (1962+1963)/2 (1962+1963)/2
.
.
.
BELGIUM 1 (1960+1961)/2 (1960+1961)/2
BELGIUM 1 (1960+1961)/2 (1960+1961)/2
如果您注意到,每个国家/地区的年份和长度相同。
我尝试了以下代码
aggregate(Gasoline[, 3:4],
list(Gasoline$country,
group=sample("1960:1962", "1963:1965", "1966:1978",54,rep=T)),
mean)
但我得到的结果是每个国家所有年份的汇总和平均值。
提前谢谢大家
答案 0 :(得分:1)
我认为你正在寻找类似的东西......
1-为分组创建新变量
Gasoline$groups <- cut(Gasoline$year,
breaks = c(1960, 1962, 1965, 1968),
include.lowest=TRUE)
2-获取由lgaspcar
和lincomep
汇总的country
和groups
的平均值
out <- aggregate(cbind(lgaspcar, lincomep)~country+groups,
data=Gasoline,
FUN=mean)
3- country
out[order(out$country), ]
country groups lgaspcar lincomep
1 AUSTRIA [1960,1962] 4.137117 -6.450142
4 AUSTRIA (1962,1965] 4.033983 -6.294668
7 AUSTRIA (1965,1968] 4.048651 -6.231340
2 BELGIUM [1960,1962] 4.121444 -6.173857
5 BELGIUM (1962,1965] 3.982391 -6.045911
8 BELGIUM (1965,1968] 3.852027 -5.961204
3 CANADA [1960,1962] 4.844109 -5.872870
6 CANADA (1962,1965] 4.842889 -5.758412
9 CANADA (1965,1968] 4.864102 -5.618063
输入数据:
Gasoline <- read.table(text="country year lgaspcar lincomep
AUSTRIA 1960 4.173244195 -6.474277179
AUSTRIA 1961 4.1009891049 -6.426005835
AUSTRIA 1965 4.033983285 -6.294667914
AUSTRIA 1966 4.0475365589 -6.252545451
AUSTRIA 1967 4.0529106939 -6.234580709
AUSTRIA 1968 4.045507048 -6.206894403
BELGIUM 1960 4.16401597 -6.215091247
BELGIUM 1961 4.124355641 -6.176842928
BELGIUM 1962 4.075961692 -6.12963802
BELGIUM 1963 4.001266072 -6.094018799
BELGIUM 1964 3.994375414 -6.036461168
BELGIUM 1965 3.9515307039 -6.00725184
BELGIUM 1966 3.8205378359 -5.994108428
BELGIUM 1967 3.9068782151 -5.964811815
BELGIUM 1968 3.8286653779 -5.924692959
CANADA 1960 4.8552384411 -5.889713473
CANADA 1961 4.8265553731 -5.884343618
CANADA 1962 4.8505325093 -5.844552303
CANADA 1963 4.8380800488 -5.792351665
CANADA 1964 4.8397604783 -5.760063369
CANADA 1965 4.850827846 -5.722821552
CANADA 1966 4.871024855 -5.671784027
CANADA 1967 4.8524989572 -5.608481132
CANADA 1968 4.868782423 -5.573924431", header=TRUE)
答案 1 :(得分:0)
您可以为年份组合创建新的因子变量。这是一种方法。
# load in data
library(data.table)
fread('country year lgaspcar lincomep
AUSTRIA 1960 4.173244195 -6.474277179
AUSTRIA 1961 4.1009891049 -6.426005835
AUSTRIA 1965 4.033983285 -6.294667914
AUSTRIA 1966 4.0475365589 -6.252545451
AUSTRIA 1967 4.0529106939 -6.234580709
AUSTRIA 1968 4.045507048 -6.206894403
BELGIUM 1960 4.16401597 -6.215091247
BELGIUM 1961 4.124355641 -6.176842928
BELGIUM 1962 4.075961692 -6.12963802
BELGIUM 1963 4.001266072 -6.094018799
BELGIUM 1964 3.994375414 -6.036461168
BELGIUM 1965 3.9515307039 -6.00725184
BELGIUM 1966 3.8205378359 -5.994108428
BELGIUM 1967 3.9068782151 -5.964811815
BELGIUM 1968 3.8286653779 -5.924692959
CANADA 1960 4.8552384411 -5.889713473
CANADA 1961 4.8265553731 -5.884343618
CANADA 1962 4.8505325093 -5.844552303
CANADA 1963 4.8380800488 -5.792351665
CANADA 1964 4.8397604783 -5.760063369
CANADA 1965 4.850827846 -5.722821552
CANADA 1966 4.871024855 -5.671784027
CANADA 1967 4.8524989572 -5.608481132
CANADA 1968 4.868782423 -5.573924431') -> d
?findInterval可用于支持年度组因子,如下所示:
factor(findInterval(d$year, c(1960, 1962, 1965, 1978), rightmost.closed=TRUE, left.open=FALSE), labels=c("1960:1962", "1963:1965", "1966:1978")) -> d$group
现在只需计算组/国家意味着:
aggregate(lgaspcar ~ country + group, data=d, FUN=mean)
country group lgaspcar
1 AUSTRIA 1960:1962 4.137117
2 BELGIUM 1960:1962 4.144186
3 CANADA 1960:1962 4.840897
4 BELGIUM 1963:1965 4.023868
5 CANADA 1963:1965 4.842791
6 AUSTRIA 1966:1978 4.044984
7 BELGIUM 1966:1978 3.876903
8 CANADA 1966:1978 4.860784
我应该提一下@Jilber使用的剪切函数,我的答案中提到的findIntervals几乎是相同的。两者的唯一区别是关于间隔限制的默认行为。