Reading timestamp data in R from multiple time zones

时间:2015-09-30 23:14:15

标签: r datetime timezone

I have a column of time stamps in character format that looks like this:

2015-09-24 06:00:00 UTC

2015-09-24 05:00:00 UTC

dateTimeZone <- c("2015-09-24 06:00:00 UTC","2015-09-24 05:00:00 UTC")

I'd like to convert this character data into time data using POSIXct, and if I knew that all the time stamps were in UTC, I would do it like this:

dateTimeZone <- asPOSIXct(dateTimeZone, tz="UTC")

However, I don't necessarily know that all the time stamps are in UTC, so I tried

dateTimeZone <- asPOSIXct(dateTimeZodateTimeZone, format = "%Y-%m-%d %H:%M:%S %Z")

However, because strptime supports %Z only for output, this returns the following error:

Error in strptime(x, format, tz = tz) : use of %Z for input is not supported

I checked the documentation for the lubridate package, and I couldn't see that it handled this issue any differently than POSIXct.

Is my only option to check the time zone of each row and then use the appropriate time zone with something like the following?

temp[grepl("UTC",datetimezone)] <- as.POSIXct(datetimezone, tz="UTC")
temp[grepl("PDT",datetimezone)] <- as.POSIXct(datetimezone, tz="America/Los_Angeles")

3 个答案:

答案 0 :(得分:4)

您可以通过检查每一行并相应地进行处理,然后将所有内容重新置于一致的UTC时间来实现。 (#edited现在包括将时区缩写与完整的时区规范相匹配)

dates <- c(
  "2015-09-24 06:00:00 UTC",
  "2015-09-24 05:00:00 PDT"
)

#extract timezone from dates
datestz <- vapply(strsplit(dates," "), tail, 1, FUN.VALUE="")

## Make a master list of abbreviation to 
## full timezone names. Used an arbitrary summer
## and winter date to try to catch daylight savings timezones.

tzabbrev <- vapply(
  OlsonNames(),
  function(x) c(
    format(as.POSIXct("2000-01-01",tz=x),"%Z"),
    format(as.POSIXct("2000-07-01",tz=x),"%Z")
  ),
  FUN.VALUE=character(2)
)
tmp <- data.frame(Olson=OlsonNames(), t(tzabbrev), stringsAsFactors=FALSE)
final <- unique(data.frame(tmp[1], abbrev=unlist(tmp[-1])))

## Do the matching:
out <- Map(as.POSIXct, dates, tz=final$Olson[match(datestz,final$abbrev)])
as.POSIXct(unlist(out), origin="1970-01-01", tz="UTC")
#  2015-09-24 06:00:00 UTC   2015-09-24 05:00:00 PDT 
#"2015-09-24 06:00:00 GMT" "2015-09-24 12:00:00 GMT" 

答案 1 :(得分:1)

data.table解决方案:

library(data.table)

data <- data.table(dateTimeZone=c("2015-09-24 06:00:00 UTC",
                                  "2015-09-24 05:00:00 America/Los_Angeles"))
data[, timezone:=tstrsplit(dateTimeZone, split=" ")[[3]]]
data[, datetime.local:=as.POSIXct(dateTimeZone, tz=timezone), by=timezone]
data[, datetime.utc:=format(datetime.local, tz="UTC")]

关键是要在时区字段上拆分数据,以便您可以将每组时区分别提供给as.POSIXct(我不确定为什么as.POSIXct不会让您给出它实际上是时区的矢量。在这里,我使用data.table高效的split-apply-combine语法,但您可以将相同的一般概念应用于基础R或使用dplyr

答案 2 :(得分:0)

使用lubridate ...的另一种方法

library(stringr)
library(lubridate)

normalize.timezone <- function(dates, target_tz = local.timezone) {
    tzones <- str_split(dates, ' ')
    tzones <- lapply(tzones, '[', 3)
    tzones <- unlist(tzones)
    dts <- str_replace_all(dates, ' [\\w\\-\\/\\+]+$', '')
    tmp <- lapply(1:length(dates), function(i) {
        with_tz(as.POSIXct(dts[ i ], tz = tzones[ i ]), target_tz)
    })
    final <- unlist(tmp)
    attributes(final) <- attributes(tmp[[ 1 ]])
    final
}

dates <- c('2019-01-06 23:00:00 MST', 
           '2019-01-22 14:00:00 America/Los_Angeles', 
           '2019-01-05 UTC-4', 
           '2019-01-15 15:00:00 Europe/Moscow')
(normalize.timezone(dates, 'EST'))
相关问题