`cut`函数出错

时间:2013-04-11 00:13:31

标签: r cut

我正在尝试将旧金山的所有日期房屋分组按年销售。我正在使用以下代码

geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))

geo_big$date_r <- cut(geo_big$month, breaks = as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01")), include.lowest = TRUE, labels = as.Date(c("2003-01 - 2004-12", "2004-01 - 2004-12", "2005-01 - 2005-12", "2006-01 - 2006-12", "2007-01 - 2007-12", "2008-01 - 2008-11")))

收到此消息:

Error in charToDate(x) : 
  character string is not in a standard unambiguous format

任何人都知道发生了什么事?

1 个答案:

答案 0 :(得分:0)

给出的错误应该表明问题不是cut而是as.Date。 (它向你抱怨无法确定日期的格式)

更具体地说,它是你有标签的东西。无需将其包裹在as.Date

标签应为characterc(.),引号就足够了。


就像一点点,上面的代码可以在几个方面清理 此外,lubridate包可能对您非常有用。

# instead of: 
geo_big$month <- as.Date(paste0(strftime(geo_big$date, format = "%Y-%m"), "-01"))

# you can use `floor_date`: 
library(lubridate)
geo_big$month <- floor_date(geo_big$date, "month")  # from the `lubridate` pkg


# instead of: 
... a giant cut statement... 

# use variables for ease of reading and debugging

# bks <- as.Date(c("2003-04-01", "2004-01-01", "2005-01-01", "2006-01-01", "2007-01-01", "2008-11-01")) 
# or: 
bks <- c(dmin, seq.Date(ceiling_date(dmin, "year"), floor_date(dmax, "year"), by="year"), dmax)  # still using library(lubridate)

# basing your labels on your breaks helps guard against human error & typos
lbls <- head(floor_date(bks, "year"), -1)  # dropping the last one, and adding dmax
lbls <- paste( substr(lbls, 1, 7),   substr(c(lbls[-1] - 1, dmax), 1, 7), sep=" - ")

# a cleaner, more readable `cut` statement
cut(geo_big$month, breaks=bks, include.lowest=TRUE, labels=lbls)