过滤data.table日期类型列是否包含月份

时间:2014-10-30 20:13:35

标签: r data.table

我在1月和12月有很多异常值,所以我想暂时排除它们。这是我的data.table

> str(statistics2)
Classes 'data.table' and 'data.frame':  1418 obs. of  4 variables:
 $ status: chr  "hire" "normal" "hire" "hire" ...
 $ month : Date, format: "1993-01-01" "1993-01-01" ...
 $ NOBS  : int  37459 765 12 16 24 17 2 12 2 11 ...

我尝试创建一个检查月份的条件,但是我收到以下错误。

format(statistics2['month'], "%m")
Error in `[.data.table`(statistics2, "month") : 
  typeof x.month (double) != typeof i.month (character)

3 个答案:

答案 0 :(得分:2)

由于您的问题专门询问data.table,因此data.table包中内置了一组类似于类似luridate的函数(例如,加载包并键入?month)。您不需要format(...)lubridate

library(data.table)
DT <- data.table(status=c("hire","normal","hire"),
                 month=as.Date(c("1993-01-01","1993-06-01", "1993-12-01")),
                 NOBS=c(37459,765,12))
DT
#    status      month  NOBS
# 1:   hire 1993-01-01 37459
# 2: normal 1993-06-01   765
# 3:   hire 1993-12-01    12

DT[!(month(month) %in% c(1,12))]
#    status      month NOBS
# 1: normal 1993-06-01  765

答案 1 :(得分:1)

好吧,如果statistics2是data.frame

statistics2 <- data.frame(status=c("hire","normal","hire"),
    month=as.Date(c("1993-01-01","1993-06-01", "1993-12-01")),
    NOBS=c(37459,765,12)
)

然后你应该使用

format(statistics2[["month"]], "%m")
# [1] "01" "06" "12"

(请注意双括号 - 否则您将返回format()无法正确解释的列表。

如果statistics2是data.table

statistics2dt <- data.table(statistics2)

然后我会认为statistics2dt['month']会返回不同的错误,但在这种情况下正确的语法是

format(statistics2dt[, month], "%m")
# [1] "01" "06" "12"

(没有引号和逗号)

答案 2 :(得分:0)

您可以使用lubridate提取月份并从数据框中排除这些月份:

require(lubridate)

rm(list = ls(all = T))

set.seed(0)
months <- round(runif(100, 1, 12), digits = 0)
years <- round(runif(100, 2013, 2014), digits = 0)
day <- round(runif(100, 2, 25), digits = 0)

dates <- paste(years, months, day, sep = "-")

dates <- as.Date(dates, "%Y-%m-%d")
NOBS <- round(runif(100, 1, 1000), digits = 0)

statistics2 <- cbind.data.frame(dates, NOBS)

months <- month(statistics2$dates)

excJanDec <- statistics2[-which(months %in% c(1, 12)) ,]