dplyr

时间:2016-04-14 15:03:37

标签: r dplyr

我尝试使用as.Date()生成以UTC时区记录的时间戳日。这有时会在分组的tbl_df对象中产生无法解释的NA,但如果我将同一个对象括在data.frame()ungroup()中或过滤它,则不会。我的例子如下。对于wcid = 148,分组的tbl_df对象为checkit,错误的观察为#3。其时间戳没有异常,但as.Date()将为其返回NA,除非我如上所述转换checkit

> checkit
Source: local data frame [6 x 3]
Groups: wcid, ab_split_test [6]

   wcid ab_split_test   mailing_timestamp
  (dbl)         (chr)              (time)
1     1             N                <NA>
2    78             Y 2016-04-04 12:28:58
3   148             Y 2016-03-17 09:11:31
4   204             Y 2016-03-04 09:01:15
5   255             Y 2016-03-03 09:18:43
6   267             Y 2016-03-23 09:16:50
> class(checkit)
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
> checkit %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
Source: local data frame [6 x 4]
Groups: wcid, ab_split_test [6]

   wcid ab_split_test   mailing_timestamp treatment_day_actual
  (dbl)         (chr)              (time)               (date)
1     1             N                <NA>                 <NA>
2    78             Y 2016-04-04 12:28:58           2016-04-04
3   148             Y 2016-03-17 09:11:31                 <NA>
4   204             Y 2016-03-04 09:01:15           2016-03-04
5   255             Y 2016-03-03 09:18:43           2016-03-03
6   267             Y 2016-03-23 09:16:50           2016-03-23
> ungroup(checkit) %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
Source: local data frame [6 x 4]

   wcid ab_split_test   mailing_timestamp treatment_day_actual
  (dbl)         (chr)              (time)               (date)
1     1             N                <NA>                 <NA>
2    78             Y 2016-04-04 12:28:58           2016-04-04
3   148             Y 2016-03-17 09:11:31           2016-03-17
4   204             Y 2016-03-04 09:01:15           2016-03-04
5   255             Y 2016-03-03 09:18:43           2016-03-03
6   267             Y 2016-03-23 09:16:50           2016-03-23
> data.frame(checkit) %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
  wcid ab_split_test   mailing_timestamp treatment_day_actual
1    1             N                <NA>                 <NA>
2   78             Y 2016-04-04 12:28:58           2016-04-04
3  148             Y 2016-03-17 09:11:31           2016-03-17
4  204             Y 2016-03-04 09:01:15           2016-03-04
5  255             Y 2016-03-03 09:18:43           2016-03-03
6  267             Y 2016-03-23 09:16:50           2016-03-23
> filter(checkit, wcid == 148) %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
Source: local data frame [1 x 4]
Groups: wcid, ab_split_test [1]

   wcid ab_split_test   mailing_timestamp treatment_day_actual
  (dbl)         (chr)              (time)               (date)
1   148             Y 2016-03-17 09:11:31           2016-03-17

这里的输入:

> dput(checkit)
structure(list(wcid = c(1, 78, 148, 204, 255, 267), ab_split_test = c("N", 
"Y", "Y", "Y", "Y", "Y"), mailing_timestamp = structure(c(NA, 
1459787338.92449, 1458220291.82732, 1457100075.70328, 1457014723.60799, 
1458739010.74587), class = c("POSIXct", "POSIXt"), tzone = "")), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), vars = list(
    wcid, ab_split_test), drop = TRUE, indices = list(0L, 1L, 
    2L, 3L, 4L, 5L), group_sizes = c(1L, 1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
    wcid = c(1, 78, 148, 204, 255, 267), ab_split_test = c("N", 
    "Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA, 
-6L), vars = list(wcid, ab_split_test), drop = TRUE, .Names = c("wcid", 
"ab_split_test")), .Names = c("wcid", "ab_split_test", "mailing_timestamp"
))

我刚从dput()注意到时区丢失了。当我查询它时,它显示为我的语言环境:

> attr(as.POSIXlt(checkit$mailing_timestamp),'tzone')
[1] ""    "EST" "EDT"

这不应该是,因为我的dplyr::tbl()调用中的sql参数专门请求UTC,如select mailing_timestamp at time zone 'UTC' as mailing_timestamp中所示。我正在连接到PostgreSQL数据库。

0 个答案:

没有答案