将netcdf时间变量转换为R日期对象

时间:2017-09-01 13:43:48

标签: r netcdf netcdf4 unidata

我有一个带有时间序列的netcdf文件,时间变量具有以下典型元数据:

    double time(time) ;
            time:standard_name = "time" ;
            time:bounds = "time_bnds" ;
            time:units = "days since 1979-1-1 00:00:00" ;
            time:calendar = "standard" ;
            time:axis = "T" ;

Inside R我想将时间转换为R日期对象。我通过读取units属性并拆分字符串并使用第三个条目作为我的原点(因此假设间距为“天”,时间为00:00等)以硬连线方式实现此目的:

require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])

这个硬连线解决方案适用于我的具体示例,但我希望R中可能有一个包很好地处理时间单位的UNIDATA netcdf数据约定并将它们安全地转换为R日期对象?

3 个答案:

答案 0 :(得分:3)

我知道没有。我使用lubridate这个方便的功能,它基本上与你的相同。

getNcTime <- function(nc) {
    require(lubridate)
    ncdims <- names(nc$dim) #get netcdf dimensions
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
    times <- ncvar_get(nc, timevar)
    if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
    timeatt <- ncatt_get(nc, timevar) #get attributes
    timedef <- strsplit(timeatt$units, " ")[[1]]
    timeunit <- timedef[1]
    tz <- timedef[5]
    timestart <- strsplit(timedef[4], ":")[[1]]
    if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
        cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
        warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
        timedef[4] <- "00:00:00"
    }
    if (! tz %in% OlsonNames()) {
        cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
        warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
        tz <- "UTC"
    }
    timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
    f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
        seconds=seconds, second=seconds, sec=seconds,
        minutes=minutes, minute=minutes, min=minutes,
        hours=hours,     hour=hours,     h=hours,
        days=days,       day=days,       d=days,
        months=months,   month=months,   m=months,
        years=years,     year=years,     yr=years,
        NA
    )
    suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
    timestart + f(times)
}

编辑:有人可能还想看看ncdf4.helpers::nc.get.time.series

EDIT2:请注意,新提议且当前正在开发的令人敬畏的stars包会自动处理日期,请参阅the first blog post以获取示例。

EDIT3:另一种方法是直接使用units包,这是stars使用的包。可以这样做:(仍然没有正确处理日历,我不确定units可以)

getNcTime <- function(nc) { ##NEW VERSION, with the units package
    require(units)
    require(ncdf4)
    options(warn=1) #show warnings by default
    if (is.character(nc)) nc <- nc_open(nc)
    ncdims <- names(nc$dim) #get netcdf dimensions
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
    if (length(timevar) > 1) {
        warning(paste("Found more than one time var. Using the first:", timevar[1]))
        timevar <- timevar[1]
    }
    if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
    times <- ncvar_get(nc, timevar) #get time data
    timeatt <- ncatt_get(nc, timevar) #get attributes
    timeunit <- timeatt$units
    units(times) <- make_unit(timeunit)
    as.POSIXct(time)
}

答案 1 :(得分:3)

我无法获得@ AF7的功能来处理我的文件所以我自己写了。下面的函数创建一个POSIXct日期向量,从nc文件中读取开始日期,时间间隔,单位和长度。它适用于许多(但可能不是每个......)形状或形式的nc文件。

 ncdate <- function(nc) {
    ncdims <- names(nc$dim) #Extract dimension names
    timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
                                          "date", "Date"))[1]] # Pick the time dimension
    ntstep <-nc$dim[[timevar]]$len
    t <- ncvar_get(nc, timevar) # Extract the timestep count
    tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
    tspace <- t[2] - t[1] # Calculate time period between two timesteps, for the "by" argument 
    tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
    a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
    uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
    startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
    tmulti <- 3600 # Declare hourly multiplier for date
    if (uname == "days") tmulti =86400 # Declare daily multiplier for date
    ## Rename "seconds" to "secs" for "by" argument and change the multiplier.
    if (uname == "seconds") {
        uname <- "secs"
        tmulti <- 1 }
    byt <- paste(tspace,uname) # Define the "by" argument
    if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
    byt= "1 hour"                       ## R won't understand "by < 1" so change by and unit to hour.
    uname = "hours"}
    datev <- seq(from=as.POSIXct(startd+t[1]*tmulti),by= byt, units=uname,length=ntstep)
}

答案 2 :(得分:0)

我刚刚发现有一个名为 ncdf.tools 的软件包,该软件包具有以下功能:

  

convertDateNcdf2R

其中

  

转换netCDF文件中的时间向量或儒略日的向量   从指定原点到POSIXct R的时间(或秒,分钟,小时)   向量。

这很有帮助。此处提供更多详细信息: https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html