将导入的日期时间从字符转换为数字而不会丢失精度

时间:2015-10-27 22:29:52

标签: r

我正在使用read.xlsx循环遍历多个.xlsx文件,这些文件将聚合到一个数据框中。我遇到的问题是当日期时间字段被拉入数据框时,它被存储为具有高精度的字符数据类型:

Excel文件中的数据如下:

10/09/2015 08:15:32
10/09/2015 08:15:33
10/09/2015 08:15:34
10/09/2015 08:15:35
10/09/2015 08:15:36
10/09/2015 08:15:37
10/09/2015 08:15:38
10/09/2015 08:15:39
10/09/2015 08:15:40
10/09/2015 08:15:41
10/09/2015 08:15:42
10/09/2015 08:15:43
10/09/2015 08:15:44
10/09/2015 08:15:45
10/09/2015 08:15:46
10/09/2015 08:15:47
10/09/2015 08:15:48
10/09/2015 08:15:49    
10/09/2015 08:15:51
10/09/2015 08:15:52

读入数据帧后的数据:

class(dfCTS$DateTime)
# [1] "character"
print(dfCTS$DateTime)[1:20]
#  [1] "42286.34412037037"  "42286.344131944446" "42286.344143518516" "42286.344155092593" "42286.344166666669" "42286.344178240739" "42286.344189814816" "42286.344201388885" "42286.344212962962" "42286.344224537039"
# [11] "42286.344236111108" "42286.344247685185" "42286.344259259262" "42286.344270833331" "42286.344282407408" "42286.344293981485" "42286.344305555554" "42286.344317129631" "42286.344340277778" "42286.344351851854"

当我尝试将字符数据类型转换为数字并返回日期时,我会丢失日期时间并且还会得到重复项,我认为这是由于精度损失或精度过高造成的。

as.POSIXct(as.numeric(dfCTS$DateTime) * (60*60*24), origin = "1899-12-30", tz = "GMT")[1:20]
#  [1] "2015-10-09 08:15:32 GMT" "2015-10-09 08:15:33 GMT" "2015-10-09 08:15:34 GMT" "2015-10-09 08:15:35 GMT" "2015-10-09 08:15:36 GMT""2015-10-09 08:15:37 GMT" "2015-10-09 08:15:38 GMT" "2015-10-09 08:15:38 GMT"
#  [9] "2015-10-09 08:15:40 GMT" "2015-10-09 08:15:41 GMT" "2015-10-09 08:15:41 GMT" "2015-10-09 08:15:43 GMT" "2015-10-09 08:15:44 GMT" "2015-10-09 08:15:45 GMT" "2015-10-09 08:15:46 GMT" "2015-10-09 08:15:47 GMT"
# [17] "2015-10-09 08:15:48 GMT" "2015-10-09 08:15:49 GMT" "2015-10-09 08:15:51 GMT" "2015-10-09 08:15:52 GMT"

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

我不确定您是否有理由进行所有这些中间转换,但如果您想直接从文件中将数据读入POSIXct列,则可以选择以下两种方法。

选项1 :使用as.POSIXct()设置自定义类。只需将text = txt替换为下面read.table()中的文件名。

setClass("myPOSIX")
setAs(
    "character", 
    "myPOSIX", 
    function(from) as.POSIXct(from, format = "%m/%d/%Y %T", tz = "GMT")
)    
df <- read.table(text = txt, sep = "\n", colClasses = "myPOSIX")
class(df[[1]])
# [1] "POSIXct" "POSIXt" 

选项2:使用readr包。这是我第一次使用这个软件包,所以希望我能以正确的方式完成它。

library(readr)
df <- read_csv(
    txt, 
    col_names = "date", 
    col_types = cols(date = col_datetime(format = "%m/%d/%Y %T"))
)
class(df$date)
# [1] "POSIXct" "POSIXt" 

数据:

txt <- "10/09/2015 08:15:32
10/09/2015 08:15:33
10/09/2015 08:15:34
10/09/2015 08:15:35
10/09/2015 08:15:36
10/09/2015 08:15:37
10/09/2015 08:15:38
10/09/2015 08:15:39
10/09/2015 08:15:40
10/09/2015 08:15:41
10/09/2015 08:15:42
10/09/2015 08:15:43
10/09/2015 08:15:44
10/09/2015 08:15:45
10/09/2015 08:15:46
10/09/2015 08:15:47
10/09/2015 08:15:48
10/09/2015 08:15:49
10/09/2015 08:15:51
10/09/2015 08:15:52"