R聚合日期然后字符

时间:2015-05-13 21:29:04

标签: r aggregate

我的表格如下所示:

Year    Country Variable 1  Variable 2
1970    UK            1       3
1970    USA           1       3
1971    UK            2       5
1971    UK            2       3
1971    UK            1       5
1971    USA           2       2
1972    USA           1       1
1972    USA           2       5

如果有人能告诉我如何汇总数据以便按年分组,然后将变量1和变量2的总和放在后面,那么输出将是:

我将不胜感激。
Year    Country Sum Variable 1  Sum Variable 2
1970    UK              1           3
1970    USA             1           3
1971    UK              5           13
1971    USA             2           2
1972    USA             3           6

这是我试图无效的代码(真正的数据帧是125,000行乘30+列,因此是子集。请善待,我是R的新手!)

#making subset from data
GT2 <- subset(GT1, select = c("iyear", "country_txt", "V1", "V2"))
#making sure data types are correct
GT2[,2]=as.character(GT2[,2])
GT2[,3] <- as.numeric(as.character( GT2[,3] ))
GT2[,4] <- as.numeric(as.character( GT2[,4] ))

#removing NA values
GT2Omit <- na.omit(GT2)

#trying to aggregate - i.e. group by year, then country with the sum of Variable 1 and Variable 2 being shown
aggGT2 <-aggregate(GT2Omit, by=list(GT2Omit$iyear, GT2Omit$country_txt), FUN=sum, na.rm=TRUE)

2 个答案:

答案 0 :(得分:2)

您的汇总几乎是正确的:

> aggGT2 <-aggregate(GT2Omit[3:4], by=GT2Omit[c("country_txt", "iyear")], FUN=sum, na.rm=TRUE)
> aggGT2
  country_txt iyear V1 V2
1          UK  1970  1  3
2         USA  1970  1  3
3          UK  1971  5 13
4         USA  1971  2  2
5         USA  1972  3  6

答案 1 :(得分:1)

dplyr现在几乎总是答案。

library(dplyr)
aggGT1 <- GT1 %>% group_by(iyear, country_txt) %>% summarize(sv1=sum(V1), sv2=sum(V2))

话虽如此,最好学习aggregateby等基本R函数。