以下是我的数据框的示例。我在R.工作。
date name count
2016-11-12 Joe 5
2016-11-15 Bob 5
2016-06-15 Nick 12
2016-10-16 Cate 6
我想在我的数据框中添加一列,告诉我与日期对应的季节。我希望它看起来像这样:
date name count Season
2016-11-12 Joe 5 Winter
2016-11-15 Bob 5 Winter
2017-06-15 Nick 12 Summer
2017-10-16 Cate 6 Fall
我已经开始了一些代码:
startWinter <- c(month.name[1], month.name[12], month.name[11])
startSummer <- c(month.name[5], month.name[6], month.name[7])
startSpring <- c(month.name[2], month.name[3], month.name[4])
# create a function to find the correct season based on the month
MonthSeason <- function(Month) {
# !is.na()
# ignores values with NA
# match()
# returns a vector of the positions of matches
# If the starting month matches a spring season, print "Spring". If the starting month matches a summer season, print "Summer" etc.
ifelse(!is.na(match(Month, startSpring)),
return("spring"),
return(ifelse(!is.na(match(Month, startWinter)),
"winter",
ifelse(!is.na(match(Month, startSummer)),
"summer","fall"))))
}
这段代码给了我一个月的季节。我不确定我是否以正确的方式解决这个问题。谁能帮我吗? 谢谢!
答案 0 :(得分:2)
有几种黑客,它们的可用性取决于您是否要使用meteorological or astronomical seasons。我会提供两者,我认为它们提供了足够的灵活性。
我将使用您提供的第二个数据,因为它提供的不仅仅是“冬天”。
txt <- "date name count
2016-11-12 Joe 5
2016-11-15 Bob 5
2017-06-15 Nick 12
2017-10-16 Cate 6"
dat <- read.table(text = txt, header = TRUE, stringsAsFactors = FALSE)
dat$date <- as.Date(dat$date)
当季节严格按月定义时,最快的方法效果很好。
metseasons <- c(
"01" = "Winter", "02" = "Winter",
"03" = "Spring", "04" = "Spring", "05" = "Spring",
"06" = "Summer", "07" = "Summer", "08" = "Summer",
"09" = "Fall", "10" = "Fall", "11" = "Fall",
"12" = "Winter"
)
metseasons[format(dat$date, "%m")]
# 11 11 06 10
# "Fall" "Fall" "Summer" "Fall"
如果您选择使用未按月开始/停止定义的季节的日期范围,例如天文季节,那么这是另一个“黑客”:
astroseasons <- as.integer(c("0000", "0320", "0620", "0922", "1221", "1232"))
astroseasons_labels <- c("Winter", "Spring", "Summer", "Fall", "Winter")
如果您使用正确的Date
或POSIX
类型,那么您将包含多年,这会使事情变得不那么通用。人们可能会想到使用朱利安日期,但在闰年期间会产生异常。因此,假设2月28日永远不是季节性边界,我正在“数字化”月 - 日。即使R确实进行了字符比较,cut
也需要数字,所以我们将它们转换为整数。
两个保护措施:因为cut
要么是右开(和左关),要是右关(和左开),那么我们的两个书端需要扩展超越< / em>法定日期,ergo "0000"
和"1232"
。还有其他技术可以在这里同样有效(例如,使用-Inf
和Inf
,后整合)。
astroseasons_labels[ cut(as.integer(format(dat$date, "%m%d")), astroseasons, labels = FALSE) ]
# [1] "Fall" "Fall" "Spring" "Fall"
请注意,第三个日期是在春天使用天文季节和夏天。否则。
此解决方案可以轻松调整,以适应南半球或其他季节性偏好/信仰。
修改:受@Kristofersen's answer的推动(谢谢),我查看了基准测试。 lubridate::month
使用POSIXct
到 - POSIXlt
转换来提取月份,这比我的format(x, "%m")
方法快10倍。就这样:
metseasons2 <- c(
"Winter", "Winter",
"Spring", "Spring", "Spring",
"Summer", "Summer", "Summer",
"Fall", "Fall", "Fall",
"Winter"
)
注意as.POSIXlt
返回0个月,我们添加1:
metseasons2[ 1 + as.POSIXlt(dat$date)$mon ]
# [1] "Fall" "Fall" "Summer" "Fall"
比较:
library(lubridate)
library(microbenchmark)
set.seed(42)
x <- Sys.Date() + sample(1e3)
xlt <- as.POSIXlt(x)
microbenchmark(
metfmt = metseasons[ format(x, "%m") ],
metlt = metseasons2[ 1 + xlt$mon ],
astrofmt = astroseasons_labels[ cut(as.integer(format(x, "%m%d")), astroseasons, labels = FALSE) ],
astrolt = astroseasons_labels[ cut(100*(1+xlt$mon) + xlt$mday, astroseasons, labels = FALSE) ],
lubridate = sapply(month(x), seasons)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# metfmt 1952.091 2135.157 2289.63943 2212.1025 2308.1945 3748.832 100
# metlt 14.223 16.411 22.51550 20.0575 24.7980 68.924 100
# astrofmt 2240.547 2454.245 2622.73109 2507.8520 2674.5080 3923.874 100
# astrolt 42.303 54.702 72.98619 66.1885 89.7095 163.373 100
# lubridate 5906.963 6473.298 7018.11535 6783.2700 7508.0565 11474.050 100
因此使用as.POSIXlt(...)$mon
的方法要快得多。 (@ Kristofersen的答案可以通过对其进行矢量化来改进,可能使用ifelse
,但仍然无法与使用或不使用cut
的矢量查找速度进行比较。)
答案 1 :(得分:1)
如果你的数据是df:
# create dataframe for month and corresponding season
dfSeason <- data.frame(season = c(rep("Winter", 3), rep("Summer", 3),
rep("Spring", 3), rep("Fall", 3)),
month = month.name[c(11,12,1, 5:7, 2:4, 8:10)],
stringsAsFactors = F)
# make date as date
df$data <- as.Date(df$date)
# match the month of the date in df (format %B) with month in season
# then use it to index the season of dfSeason
df$season <- dfSeason$season[match(format(df$data, "%B"), dfSeason$month)]
答案 2 :(得分:1)
您可以使用lubridate快速完成此操作,并将月份数量更改为一个季节。
library(lubridate)
seasons = function(x){
if(x %in% 2:4) return("Spring")
if(x %in% 5:7) return("Summer")
if(x %in% 8:10) return("Fall")
if(x %in% c(11,12,1)) return("Winter")
}
dat$Season = sapply(month(dat$date), seasons)
> dat
date name count Season
1 2016-11-12 Joe 5 Winter
2 2016-11-15 Bob 5 Winter
3 2016-06-15 Nick 12 Summer
4 2016-10-16 Cate 6 Fall