聚合类别数据

时间:2016-01-17 06:53:26

标签: r

我试图将同年季度的收入值相加。在Excel中,有sumif功能允许您这样做。我正在使用R equivelnt aggregate,但我正在努力让它发挥作用。输出中的总和数字完全不正确。我做错了什么?

> aggregate( . ~ dateyear,data=x,sum)
   dateyear revenue
1      2001     130
2      2002     176
3      2003     155
4      2004     159
5      2005     150
6      2006     161
7      2007     144
8      2008     120
9      2009      69
10     2010      54
11     2011      66
12     2012      92
13     2013     116
14     2014      94
15     2015      99

dput(x)
structure(list(dateyear = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 
13L, 13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L), .Label = c("2001", 
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", 
"2010", "2011", "2012", "2013", "2014", "2015"), class = "factor"), 
    revenue = structure(c(47L, 43L, 40L, 58L, 45L, 38L, 35L, 
    57L, 37L, 27L, 34L, 55L, 44L, 29L, 31L, 51L, 39L, 28L, 32L, 
    56L, 42L, 30L, 33L, 59L, 36L, 25L, 24L, 54L, 26L, 23L, 17L, 
    50L, 12L, 5L, 2L, 41L, 8L, 4L, 1L, 46L, 10L, 7L, 3L, 48L, 
    19L, 16L, 9L, 52L, 21L, 15L, 15L, 13L, 49L, 20L, 14L, 11L, 
    53L, 22L, 18L, 6L), .Label = c("1373", "1390.7", "1416.5", 
    "1420.8", "1455.2", "1472.9", "1475.1", "1482.7", "1486.3", 
    "1498.8", "1499.1", "1505.3", "1506.9", "1512.9", "1516.8", 
    "1525.2", "1546.1", "1550.8", "1583.2", "1588.5", "1589.4", 
    "1613.4", "1646.5", "1674.2", "1689.1", "1713.6", "1721.5", 
    "1728.5", "1730.1", "1748.6", "1755.1", "1761.2", "1762.6", 
    "1764.5", "1794.3", "1799.5", "1813.9", "1818", "1838.7", 
    "1872.3", "1875.4", "1879", "1885.6", "1911.9", "1972.8", 
    "1977", "1977.6", "2009.4", "2078.7", "2082.3", "2131.5", 
    "2154.1", "2179.6", "2208.1", "2299.1", "2379.6", "2387.9", 
    "2534", "2563.3"), class = "factor")), .Names = c("dateyear", 
"revenue"), row.names = c(NA, -60L), class = "data.frame")

1 个答案:

答案 0 :(得分:2)

正如Ananda所说,您的收入值不是实际数字,字符串编码为整数(即​​一个因子)。当你总结它们时,你得到了那些奇数。 申请Ananda的代码后

x$revenue <- as.numeric(as.character(x$revenue))

您的aggregate功能应该有效。这就是我得到的:

aggregate( revenue ~ dateyear,data=x,sum)
   dateyear revenue
1      2001  5735.5
2      2002  8119.1
3      2003  7687.8
4      2004  7696.2
5      2005  7459.9
6      2006  7769.8
7      2007  7726.1
8      2008  7114.3
9      2009  6433.5
10     2010  6151.9
11     2011  6367.4
12     2012  6604.1
13     2013  8284.0
14     2014  6679.2
15     2015  6816.7