在使用ordered()

时间:2016-05-17 14:35:43

标签: r factors

你有一个包含4个变量的纵向数据框p,如下所示:

> head(p)
    date.1           County.x providers beds    price
1 Jan/2011              essex       258 5545 251593.4
2 Jan/2011 greater manchester       108 3259 152987.7
3 Jan/2011               kent       301 7191 231985.7
4 Jan/2011      tyne and wear       103 2649 143196.6
5 Jan/2011      west midlands       262 6819 149323.9
6 Jan/2012              essex         2   27 231398.5

我的变量结构如下:

'data.frame':   259 obs. of  5 variables:
 $ date.1   : Factor w/ 66 levels "Apr/2011","Apr/2012",..: 23 23 23 23 23 24 24 24 25 25 ...
 $ County.x : Factor w/ 73 levels "avon","bedfordshire",..: 22 24 32 65 67 22 32 67 22 32 ...
 $ providers: int  258 108 301 103 262 2 9 2 1 1 ...
 $ beds     : int  5545 3259 7191 2649 6819 27 185 24 70 13 ...
 $ price    : num  251593 152988 231986 143197 149324 ...

我想按时间顺序排序date.1。在应用ordered()之前,此变量不包含NA个观察值。

> summary(is.na(p$date.1))
   Mode   FALSE    NA's 
logical     259       0 

但是,一旦我应用我的函数来订购与date.1对应的级别:

p$date.1 = with(p, ordered(date.1, levels = c("Jun/2010", "Jul/2010",
                               "Aug/2010", "Sep/2010", "Oct/2010", "Nov/2010", "Dec/2010", "Jan/2011", "Feb/2011",
                                                          "Mar/2011","Apr/2011", "May/2011", "Jun/2011", "Jul/2011", "Aug/2011", "Sep/2011",
                                                          "Oct/2011", "Nov/2011", "Dec/2011" ,"Jan/2012", "Feb/2012" ,"Mar/2012" ,"Apr/2012",
                                                          "May/2012", "Jun/2012", "Jul/2012", "Aug/2012", "Sep/2012", "Oct/2012", "Nov/2012",
                                                          "Dec/2012", "Jan/2013", "Feb/2013", "Mar/2013", "Apr/2013", "May/2013",
                                                          "Jun/2013", "Jul/2013", "Aug/2013", "Sep/2013", "Oct/2013", "Nov/2013", 
                                                          "Dec/2013", "Jan/2014",
                                                          "Feb/2014", "Mar/2014", "Apr/2014", "May/2014", "Jun/2014", "Jul/2014" ,"Aug/2014",
                                                          "Sep/2014", "Oct/2014", "Nov/2014", "Dec/2014", "Jan/2015", "Feb/2015", "Mar/2015",
                                                          "Apr/2015","May/2015", "Jun/2015" ,"Jul/2015" ,"Aug/2015", "Sep/2015", "Oct/2015",
                                                          "Nov/2015")))

我似乎错过了一些观察。

> summary(is.na(p$date.1))
   Mode   FALSE    TRUE    NA's 
logical     250       9       0 

使用ordered()时是否有人遇到此问题?或者,是否有任何其他可能的解决方案按时间顺序对我的观察进行分组?

1 个答案:

答案 0 :(得分:1)

您的p$date.1之一可能与任何级别都不匹配。试试这个ord.mon作为关卡。

ord.mon <- do.call(paste, c(expand.grid(month.abb, 2010:2015), sep = "/"))

然后,您可以尝试这一点,看看两者之间是否存在任何不匹配。

p$date.1 %in% ord.mon

最后,您还可以在将date.1 columng转换为Date后对数据框进行排序(请注意,您必须事先添加实际日期)

p <- p[order(as.Date(paste0("01/", p$date.1), "%d/%b/%Y")), ]