我尝试使用as.Date()
生成以UTC时区记录的时间戳日。这有时会在分组的tbl_df对象中产生无法解释的NA,但如果我将同一个对象括在data.frame()
,ungroup()
中或过滤它,则不会。我的例子如下。对于wcid = 148,分组的tbl_df对象为checkit
,错误的观察为#3。其时间戳没有异常,但as.Date()
将为其返回NA
,除非我如上所述转换checkit
:
> checkit
Source: local data frame [6 x 3]
Groups: wcid, ab_split_test [6]
wcid ab_split_test mailing_timestamp
(dbl) (chr) (time)
1 1 N <NA>
2 78 Y 2016-04-04 12:28:58
3 148 Y 2016-03-17 09:11:31
4 204 Y 2016-03-04 09:01:15
5 255 Y 2016-03-03 09:18:43
6 267 Y 2016-03-23 09:16:50
> class(checkit)
[1] "grouped_df" "tbl_df" "tbl" "data.frame"
> checkit %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
Source: local data frame [6 x 4]
Groups: wcid, ab_split_test [6]
wcid ab_split_test mailing_timestamp treatment_day_actual
(dbl) (chr) (time) (date)
1 1 N <NA> <NA>
2 78 Y 2016-04-04 12:28:58 2016-04-04
3 148 Y 2016-03-17 09:11:31 <NA>
4 204 Y 2016-03-04 09:01:15 2016-03-04
5 255 Y 2016-03-03 09:18:43 2016-03-03
6 267 Y 2016-03-23 09:16:50 2016-03-23
> ungroup(checkit) %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
Source: local data frame [6 x 4]
wcid ab_split_test mailing_timestamp treatment_day_actual
(dbl) (chr) (time) (date)
1 1 N <NA> <NA>
2 78 Y 2016-04-04 12:28:58 2016-04-04
3 148 Y 2016-03-17 09:11:31 2016-03-17
4 204 Y 2016-03-04 09:01:15 2016-03-04
5 255 Y 2016-03-03 09:18:43 2016-03-03
6 267 Y 2016-03-23 09:16:50 2016-03-23
> data.frame(checkit) %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
wcid ab_split_test mailing_timestamp treatment_day_actual
1 1 N <NA> <NA>
2 78 Y 2016-04-04 12:28:58 2016-04-04
3 148 Y 2016-03-17 09:11:31 2016-03-17
4 204 Y 2016-03-04 09:01:15 2016-03-04
5 255 Y 2016-03-03 09:18:43 2016-03-03
6 267 Y 2016-03-23 09:16:50 2016-03-23
> filter(checkit, wcid == 148) %>% mutate(treatment_day_actual = as.Date(mailing_timestamp))
Source: local data frame [1 x 4]
Groups: wcid, ab_split_test [1]
wcid ab_split_test mailing_timestamp treatment_day_actual
(dbl) (chr) (time) (date)
1 148 Y 2016-03-17 09:11:31 2016-03-17
这里的输入:
> dput(checkit)
structure(list(wcid = c(1, 78, 148, 204, 255, 267), ab_split_test = c("N",
"Y", "Y", "Y", "Y", "Y"), mailing_timestamp = structure(c(NA,
1459787338.92449, 1458220291.82732, 1457100075.70328, 1457014723.60799,
1458739010.74587), class = c("POSIXct", "POSIXt"), tzone = "")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), vars = list(
wcid, ab_split_test), drop = TRUE, indices = list(0L, 1L,
2L, 3L, 4L, 5L), group_sizes = c(1L, 1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
wcid = c(1, 78, 148, 204, 255, 267), ab_split_test = c("N",
"Y", "Y", "Y", "Y", "Y")), class = "data.frame", row.names = c(NA,
-6L), vars = list(wcid, ab_split_test), drop = TRUE, .Names = c("wcid",
"ab_split_test")), .Names = c("wcid", "ab_split_test", "mailing_timestamp"
))
我刚从dput()
注意到时区丢失了。当我查询它时,它显示为我的语言环境:
> attr(as.POSIXlt(checkit$mailing_timestamp),'tzone')
[1] "" "EST" "EDT"
这不应该是,因为我的dplyr::tbl()
调用中的sql参数专门请求UTC,如select mailing_timestamp at time zone 'UTC' as mailing_timestamp
中所示。我正在连接到PostgreSQL数据库。