Google表格导入R的日期和时间对象

时间:2014-04-20 08:09:58

标签: r google-docs-api date-conversion

我使用OmegaHat的RGoogleDocs软件包将我的Google表格电子表格导入R. 其中一些包括日期数据,时间数据和日期/时间数据。

导入到R后,RGoogleDocs将日期和时间转换为十进制数,大概是从开始参考1900-1-1(或1899-12-30)开始的秒数。

这不是日期的问题,因为我只需使用as.Date()进行转换即可。 但是,时间数据是一个巨大的麻烦。

我知道我可以使用as.POSIXct()as.POSIXlt(),但这并不能准确地将十进制数转换回日期和时间。

Observe:

Timestamp:
2/10/2014 19:22:40; 2/10/2014 19:52:07; 2/10/2014 20:14:47;

2/10/2014 21:08:03; 2/11/2014 11:05:39; 2/11/2014 12:21:40;

2/11/2014 12:44:33;  2/11/2014 16:08:54; 2/11/2014 16:31:44;

2/12/2014 10:54:31; 2/12/2014 11:38:15; 2/12/2014 11:56:18;

2/12/2014 12:31:53; 2/12/2014 14:42:46; 2/12/2014 15:03:36;

2/13/2014 14:38:17; 2/13/2014 15:08:45; 2/13/2014 16:39:27;

2/13/2014 17:18:45; 2/13/2014 19:14:23; 2/19/2014 15:01:16;

2/19/2014 15:05:43;

RGoogleDocs'导入版本:

> mydata$"Timestamp"
 [1] 41680.81 41680.83 41680.84 41680.88 41681.46 41681.52 41681.53 41681.67
 [9] 41681.69 41682.45 41682.48 41682.50 41682.52 41682.61 41682.63 41683.61
[17] 41683.63 41683.69 41683.72 41683.80 41689.63 41689.63

转换后:

    as.POSIXct(mydata$"Timestamp"*86400, origin = "1899-12-30")
 [1] "2014-02-11 00:22:40 PKT" "2014-02-11 00:52:07 PKT" "2014-02-11 01:14:47 PKT"
 [4] "2014-02-11 02:08:02 PKT" "2014-02-11 16:05:39 PKT" "2014-02-11 17:21:40 PKT"
 [7] "2014-02-11 17:44:33 PKT" "2014-02-11 21:08:54 PKT" "2014-02-11 21:31:44 PKT"
[10] "2014-02-12 15:54:31 PKT" "2014-02-12 16:38:15 PKT" "2014-02-12 16:56:18 PKT"
[13] "2014-02-12 17:31:53 PKT" "2014-02-12 19:42:45 PKT" "2014-02-12 20:03:36 PKT"
[16] "2014-02-13 19:38:17 PKT" "2014-02-13 20:08:45 PKT" "2014-02-13 21:39:27 PKT"
[19] "2014-02-13 22:18:45 PKT" "2014-02-14 00:14:22 PKT" "2014-02-19 20:01:16 PKT"
[22] "2014-02-19 20:05:42 PKT"

不仅时间不准确,错误也会逐渐增加。有人可以帮忙吗?

2 个答案:

答案 0 :(得分:0)

发布链接的Thx。我制作了一份副本(如果你现在想要,你可以取消共享:-)。尽管电子表格属性为GMT-8,但看起来RGoogleDocs正在为您提供GMT(UTC)中的数字位。我没有发布RGoogleDocs的所有设置代码,但我假设您执行了以下操作:

docs <- getDocs(sheets.con)
ts <- getWorksheets(docs$rrr, sheets.con)
dat <- sheetAsMatrix(ts$`Form Responses`, header = TRUE, as.data.frame = TRUE, trim = TRUE)
实际上,

RGoogleDocs确实发送了天数值与时间戳字符串的比较(尽管使用字符串版本导出CSV):

dat$Timestamp
##  [1] 41689.63 41683.72 41680.88 41680.81 41683.69 41681.67 41681.69 41682.45 41682.48 41682.50 41683.80
## [12] 41689.63 41680.83 41681.52 41681.46 41680.84 41682.52 41682.61 41683.61 41683.63 41681.53 41682.63

但是,我没有看到你的转换错误(可能是几秒钟):

as.POSIXct(dat$Timestamp*86400, origin="1899-12-30 00:00:00", tz="GMT")
##  [1] "2014-02-19 15:05:42 GMT" "2014-02-13 17:18:45 GMT" "2014-02-10 21:08:03 GMT"
##  [4] "2014-02-10 19:22:39 GMT" "2014-02-13 16:39:27 GMT" "2014-02-11 16:08:53 GMT"
##  [7] "2014-02-11 16:31:44 GMT" "2014-02-12 10:54:31 GMT" "2014-02-12 11:38:14 GMT"
## [10] "2014-02-12 11:56:18 GMT" "2014-02-13 19:14:23 GMT" "2014-02-19 15:01:15 GMT"
## [13] "2014-02-10 19:52:07 GMT" "2014-02-11 12:21:40 GMT" "2014-02-11 11:05:38 GMT"
## [16] "2014-02-10 20:14:46 GMT" "2014-02-12 12:31:52 GMT" "2014-02-12 14:42:45 GMT"
## [19] "2014-02-13 14:38:17 GMT" "2014-02-13 15:08:45 GMT" "2014-02-11 12:44:33 GMT"
## [22] "2014-02-12 15:03:36 GMT"

(从Google文档中粘贴CSV导出并使用屏幕截图)

Timestamp,DOA,DOD,Length of Stay,Time of Event,Date of Event,Presentation Time,Presentation Date,Time to Presentation
2/19/2014 15:05:43,9/6/2010,9/10/2010,4,,,0:30:00,9/6/2010,
2/13/2014 17:18:45,12/28/2009,1/1/2010,4,16:13:00,,20:59:00,12/28/2009,
2/10/2014 21:08:03,12/28/2009,1/2/2010,5,16:13:00,12/28/2009,17:51:00,12/28/2009,
2/10/2014 19:22:40,12/28/2009,1/1/2010,4,16:00:00,12/28/2009,22:00:00,12/28/2009,
2/13/2014 16:39:27,3/22/2013,3/29/2013,7,,,4:46:00,3/22/2013,
2/11/2014 16:08:54,2/21/2013,4/3/2013,41,18:00:00,2/16/2013,2:49:00,2/21/2013,
2/11/2014 16:31:44,2/21/2013,2/27/2013,6,18:00:00,,5:35:00,2/21/2013,
2/12/2014 10:54:31,2/21/2013,3/5/2013,12,,,5:35:00,2/21/2013,
2/12/2014 11:38:15,2/21/2013,3/12/2013,19,18:00:00,2/20/2013,3:14:00,2/21/2013,
2/12/2014 11:56:18,2/21/2013,2/27/2013,6,,,3:16:00,2/21/2013,
2/13/2014 19:14:23,3/4/2013,3/21/2013,17,19:00:00,,2:05:00,3/4/2013,
2/19/2014 15:01:16,3/3/2013,3/19/2013,16,22:00:00,,0:00:00,3/4/2013,
2/10/2014 19:52:07,3/3/2013,3/5/2013,2,,3/3/2013,22:16:00,3/3/2013,
2/11/2014 12:21:40,4/24/2013,5/28/2013,34,19:00:00,4/23/2013,23:30:00,4/24/2013,
2/11/2014 11:05:39,3/5/2013,3/13/2013,8,21:25:00,3/3/2013,4:17:00,3/5/2013,
2/10/2014 20:14:47,5/5/2013,6/3/2013,29,21:45:00,5/5/2013,0:45:00,5/5/2013,
2/12/2014 12:31:53,11/23/2013,11/29/2013,6,23:35:00,11/23/2013,1:01:00,11/23/2013,
2/12/2014 14:42:46,11/26/2013,12/11/2013,15,9:20:00,,22:18:00,11/26/2013,
2/13/2014 14:38:17,12/12/2013,12/24/2013,12,,,21:56:00,12/11/2013,
2/13/2014 15:08:45,3/4/2013,3/14/2013,10,,3/3/2013,2:30:00,3/4/2013,
2/11/2014 12:44:33,8/9/2013,8/20/2013,11,,8/8/2013,10:43:00,8/9/2013,
2/12/2014 15:03:36,9/28/2013,9/30/2013,2,,,12:14:00,9/28/2013,

您可以验证Google云端硬盘默认设置和电子表格本身的时区设置吗?这可能是个问题。

答案 1 :(得分:0)

等等,继续......我尝试使用工作表中剩余的时间数据进行计算,结果很好而不添加tz = "GMT"。然后我意识到Timestamp是Google自动生成的。这个电子表格链接到一个表单,我想跟踪我的研究进度,所以我建立了一个自动时间戳来加速数据收集。该时间戳按UTC发布。其余数据严格遵循PKT。 所以我不小心在同一个数据集中结束了两个不同的时区。

尽管如此,感谢@hrbrmstr让我意识到我需要考虑时区! 如果我遇到夏令时问题,我会和你联系。