Question

也许我正在设计这个，但我正在构建一个函数，用于根据输入的列数自动解析日期列。

数据：

 CreatedDate              LastModifiedDate
 2015-11-20T19:46:11.000Z 2015-11-20T19:46:11.000Z
 2015-11-21T02:54:54.000Z 2015-12-01T18:48:07.000Z
 2015-11-21T14:36:32.000Z 2015-11-21T14:36:32.000Z
 2015-11-21T16:03:41.000Z 2015-11-21T16:03:41.000Z
 2015-11-21T17:31:43.000Z 2015-11-21T17:55:13.000Z




require(lubridate)
require(magrittr

parse_sf_hms <- function(subset) {
  if( is.null( ncol(subset) ) ){
    subset %>% ymd_hms(tz="America/New_York",quiet=TRUE) %>% as.Date(format="%m/%d/%Y") -> x
  return(x)
  } else {
    apply(subset, 2, function(x) x %>% ymd_hms(tz="America/New_York",quiet=TRUE) %>% as.Date(format="%m/%d/%Y") )
  return( x )
  }
}

所以问题是当我使用一列（例如df[,1]或df[,c(CreatedDate)]）时，函数会正确返回：

[1] "2015-11-20" "2015-11-21" "2015-11-21" "2015-11-21"
[5] "2015-11-21"

但是，当我使用多个列（例如df[,c(1,2)]或df[,c('CreatedDate','LastModifiedDate')]时，我得到了：

     CreatedDate LastModifiedDate
[1,]       16759            16759
[2,]       16760            16770
[3,]       16760            16760
[4,]       16760            16760
[5,]       16760            16760

为什么单个向量在格式中正确返回日期值而应用不正确？这里lapply，rbind会更好吗？只是想了解行为。

Answer 1

试试这个：

parse_sf_hms <- function(subset) {
  if( is.null( ncol(subset) ) ){
    subset %>% ymd_hms(tz="America/New_York",quiet=TRUE) %>% as.Date(format="%m/%d/%Y") -> x
    return(x)
  } else {
    x <- lapply(subset, function(x) x %>% ymd_hms(tz="America/New_York",quiet=TRUE) %>% as.Date(format="%m/%d/%Y") )
    return( x )
  }
}

正如thelatemail所说，使用lapply。此外，您的功能有错误..：

apply(subset, 2, function(x) x %>% ymd_hms(tz="America/New_York",quiet=TRUE) %>% as.Date(format="%m/%d/%Y") )

需要分配到x：

    x <- lapply(subset, function(x) x %>% ymd_hms(tz="America/New_York",quiet=TRUE) %>% as.Date(format="%m/%d/%Y") )

在apply中解析日期时间，返回数字而不是Date对象

1 个答案: