无法取任何年份的柱(A,B,C,D)的平均值。根据年份对框架进行排序,以便在R中进行绘图

时间:2015-03-27 08:17:05

标签: r sorting mean

指南:  1.用于“df”的文件复制如下:

    A       B      C       D          Date  Year
14.99    9.99   3.99    2.99      1/1/2002  2002
10.99    8.99   3.99    2.99      1/1/2006  2006
14.99    9.99                     1/1/2006  2006
14.99    9.99   3.99    2.99      1/1/1998  1998
14.99   12.99   3.99    2.99    12/25/2012  2012
10.99   10.99   3.99    2.99      4/1/2014  2014
14.99    9.99   3.99    2.99     4/15/2011  2011
14.99   12.99                    9/27/2013  2013
14.99   12.99                     5/2/2014  2014
14.99   12.99   3.99    2.99     6/17/2014  2014
14.99   12.99                     6/7/2013  2013
14.99   12.99   3.99    2.99      3/1/2013  2013
14.99    9.99   3.99    2.99    11/17/2007  2007
14.99    9.99   3.99    2.99      1/1/1987  1987
19.99   17.99   5.99    4.99     6/13/2014  2014
10.99    7.99   3.99    2.99     2/11/2014  2014
14.99   12.99   3.99    2.99      5/9/2014  2014
         9.99           2.99      1/1/2003  2003
14.99    9.99   3.99    2.99      1/1/2003  2003
14.99    9.99   3.99    2.99     11/2/2012  2012
14.99   12.99   3.99    2.99     7/17/2013  2013
14.99   12.99   3.99    2.99      7/1/1980  1980
10.99    8.99   3.99    2.99     9/30/2011  2011
         9.99           2.99      1/1/1996  1996
14.99   12.99                     3/7/2014  2014
14.99    9.99   3.99    2.99     7/29/1966  1966
         9.99                     1/1/1966  1966
14.99   12.99   3.99    2.99      3/5/2013  2013
14.99    9.99   3.99    2.99      1/1/1998  1998
12.99    9.99   3.99    2.99     7/11/2007  2007
14.99    9.99   3.99    2.99      1/1/2004  2004
14.99    9.99   3.99    2.99      1/1/1992  1992
14.99   12.99                    10/4/2013  2013
                6.99    6.99     1/30/2015  2015
  1. 空白值将由data.frame“df”中的NA替换。
  2. 每当存在任何年份超过1行时,该年仅保留1行,并更新“A”,“B”,“C”,“D”列中的值,其中包含任何相应年份的平均值(“NA” “不应该算是服用卑鄙的人”
  3. 对上述逐步获得的数据框进行排序
  4. 我曾使用下面的命令进行聚合和排序,但它没有用:

    aggregate(x=df[,-c(5)], by=list(df$Year), FUN = Mean, na.rm=TRUE)
    

2 个答案:

答案 0 :(得分:1)

您可以使用sqldf库按年查找每列的平均值,并按如下方式对其进行相应的排序:

sqldf("select year, avg(A), avg(B), avg(C), avg(D) from df group by year order by year")

平均值将忽略NAs。

答案 1 :(得分:0)

数据表中只有一个行解决方案可以提供输出,

尝试以下

df = read.table(text = 'A   B   C   D   Date    Year
14.99   9.99    3.99    2.99    1/1/2002    2002
10.99   8.99    3.99    2.99    1/1/2006    2006
14.99   9.99    NA      NA      1/1/2006    2006
14.99   9.99    3.99    2.99    1/1/1998    1998
14.99   12.99   3.99    2.99    12/25/2012  2012
10.99   10.99   3.99    2.99    4/1/2014    2014
14.99   9.99    3.99    2.99    4/15/2011   2011
14.99   12.99   NA      NA      9/27/2013   2013
14.99   12.99   NA      NA      5/2/2014    2014
14.99   12.99   3.99    2.99    6/17/2014   2014
14.99   12.99   NA      NA      6/7/2013    2013
14.99   12.99   3.99    2.99    3/1/2013    2013
14.99   9.99    3.99    2.99    11/17/2007  2007
14.99   9.99    3.99    2.99    1/1/1987    1987
19.99   17.99   5.99    4.99    6/13/2014   2014
10.99   7.99    3.99    2.99    2/11/2014   2014
14.99   12.99   3.99    2.99    5/9/2014    2014
NA      9.99    NA      2.99    1/1/2003    2003
14.99   9.99    3.99    2.99    1/1/2003    2003
14.99   9.99    3.99    2.99    11/2/2012   2012
14.99   12.99   3.99    2.99    7/17/2013   2013
14.99   12.99   3.99    2.99    7/1/1980    1980
10.99   8.99    3.99    2.99    9/30/2011   2011
NA      9.99    NA      2.99    1/1/1996    1996
14.99   12.99   NA      NA      3/7/2014    2014
14.99   9.99    3.99    2.99    7/29/1966   1966
NA      9.99    NA      NA      1/1/1966    1966
14.99   12.99   3.99    2.99    3/5/2013    2013
14.99   9.99    3.99    2.99    1/1/1998    1998
12.99   9.99    3.99    2.99    7/11/2007   2007
14.99   9.99    3.99    2.99    1/1/2004    2004
14.99   9.99    3.99    2.99    1/1/1992    1992
14.99   12.99   NA      NA      10/4/2013   2013
NA      NA      6.99    6.99    1/30/2015   2015
', header = T)

dt = as.data.table(df)

dt[order(Year), list(A = mean(A, na.rm = TRUE),
          B = mean(B, na.rm = TRUE),
          C = mean(C, na.rm = TRUE),
          D = mean(D, na.rm = TRUE)), by = Year]

   Year        A        B    C    D
 1: 2002 14.99000  9.99000 3.99 2.99
 2: 2006 12.99000  9.49000 3.99 2.99
 3: 1998 14.99000  9.99000 3.99 2.99
 4: 2012 14.99000 11.49000 3.99 2.99
 5: 2014 14.56143 12.70429 4.39 3.39
 6: 2011 12.99000  9.49000 3.99 2.99
 7: 2013 14.99000 12.99000 3.99 2.99
 8: 2007 13.99000  9.99000 3.99 2.99
 9: 1987 14.99000  9.99000 3.99 2.99
10: 2003 14.99000  9.99000 3.99 2.99
11: 1980 14.99000 12.99000 3.99 2.99
12: 1996      NaN  9.99000  NaN 2.99
13: 1966 14.99000  9.99000 3.99 2.99
14: 2004 14.99000  9.99000 3.99 2.99
15: 1992 14.99000  9.99000 3.99 2.99
16: 2015      NaN      NaN 6.99 6.99