按日期排序BoxPlot

时间:2017-02-16 17:52:42

标签: r sorting date dplyr

如何按日期排序?

我的数据如下:

EXPIRE_DATE是字符,所以我使用mutate创建了另一个true(Date)列。

我觉得我很接近,但我如何按降序或升序排序呢?

       EXPIRE_DATE     mean        sd           Date
                 (chr)    (dbl)     (dbl)          (chr)
1             04/30/17 56.75132 103.75048     April 2017
2             08/30/17 30.36706  46.12009    August 2017
3             08/31/17 42.84366  67.79964    August 2017
4             12/30/17 26.88593  23.60440  December 2017
5             12/31/17 38.67540  58.72461  December 2017
6             02/28/18 42.50570  63.91448  February 2018
7             01/30/18 28.60205  44.85719   January 2018
8             01/31/18 70.80121 134.13060   January 2018
9             07/31/17 45.45389  77.15242      July 2017
10            06/30/17 47.73592  81.88312      June 2017
11            05/30/17 46.38233  53.73065       May 2017
12            05/31/17 52.25520  88.89367       May 2017
13            11/30/17 39.27158  66.40248  November 2017
14            10/31/17 40.43197  71.51545   October 2017
15            09/30/17 43.12762  79.27168 September 2017

制作的代码如下:

list_mean_sd <- EXPIRING %>% 
                group_by(EXPIRE_DATE) %>% 
                summarize( mean = mean(TOTAL), sd = sd(TOTAL)  ) %>%
                mutate( Date = format(as.Date(EXPIRE_DATE, "%m/%d/%y"), format="%B %Y") )

我的最终目标是创建一个日期排序的Box Plot,这样它看起来并不奇怪..

boxplot(mean ~ Date, data = list_mean_sd, outline = FALSE) 

这就是我得到的......

enter image description here

dput(head(EXPIRING, 15))
structure(list(KEY = c(9495, 9541, 9638, 9717, 9743, 
9921, 10048, 10053, 10061, 10067, 10254, 10343, 24825, 25016, 
25162), TOTAL = c(20, 240, 91.04, 20, 140, 100, 
301.2, 40, 540, 469.82, 40, 140, 133.09, 1700, 20), EXPIRE_DATE = c("11/30/17", 
"01/31/18", "01/31/18", "12/31/17", "12/31/17", "01/31/18", "04/30/17", 
"07/31/17", "01/31/18", "01/31/18", "01/31/18", "01/31/18", "01/31/18", 
"01/31/18", "06/30/17")), .Names = c("KEY", "TOTAL", 
"EXPIRE_DATE"), row.names = c(NA, 15L), class = "data.frame")

加入:

dput(head(list_mean_sd, 30))
structure(list(EXPIRE_DATE = c("01/30/18", "01/31/18", 
"02/28/18", "04/30/17", "05/30/17", "05/31/17", "06/30/17", "07/31/17", 
"08/30/17", "08/31/17", "09/30/17", "10/31/17", "11/30/17", "12/30/17", 
"12/31/17"), mean = c(28.6020454545455, 70.8012116673021, 42.5057014558283, 
56.751320667367, 46.3823270440252, 52.2552028540308, 47.7359164733179, 
45.4538902012763, 30.3670622064929, 42.843660721111, 43.1276177589063, 
40.4319721861389, 39.2715832825871, 26.8859251197214, 38.6753964550534
), sd = c(44.857189842357, 134.130597512432, 63.9144788499397, 
103.750483732426, 53.7306532607393, 88.8936749200348, 81.8831227378872, 
77.1524193002944, 46.1200886362958, 67.7996403857795, 79.2716764935199, 
71.5154547562237, 66.4024797158997, 23.6044043594643, 58.7246098554578
), Date = c("January 2018", "January 2018", "February 2018", 
"April 2017", "May 2017", "May 2017", "June 2017", "July 2017", 
"August 2017", "August 2017", "September 2017", "October 2017", 
"November 2017", "December 2017", "December 2017")), .Names = c("EXPIRE_DATE", 
"mean", "sd", "Date"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-15L))

1 个答案:

答案 0 :(得分:0)

更新了比赛讨论。

您可以强制使用boxplot将数据按所需顺序排列,使其成为一个因素,其级别按您想要的方式排序。

DateOrder = order(as.Date(list_mean_sd$EXPIRE_DATE, "%m/%d/%y"))
list_mean_sd$Date = factor(list_mean_sd$Date, 
    levels = unique(list_mean_sd$Date[DateOrder]))
boxplot(mean ~ Date, data = list_mean_sd, cex.axis=0.65)

Ordered Boxplot