我正在使用read.xlsx
循环遍历多个.xlsx文件,这些文件将聚合到一个数据框中。我遇到的问题是当日期时间字段被拉入数据框时,它被存储为具有高精度的字符数据类型:
Excel文件中的数据如下:
10/09/2015 08:15:32
10/09/2015 08:15:33
10/09/2015 08:15:34
10/09/2015 08:15:35
10/09/2015 08:15:36
10/09/2015 08:15:37
10/09/2015 08:15:38
10/09/2015 08:15:39
10/09/2015 08:15:40
10/09/2015 08:15:41
10/09/2015 08:15:42
10/09/2015 08:15:43
10/09/2015 08:15:44
10/09/2015 08:15:45
10/09/2015 08:15:46
10/09/2015 08:15:47
10/09/2015 08:15:48
10/09/2015 08:15:49
10/09/2015 08:15:51
10/09/2015 08:15:52
读入数据帧后的数据:
class(dfCTS$DateTime)
# [1] "character"
print(dfCTS$DateTime)[1:20]
# [1] "42286.34412037037" "42286.344131944446" "42286.344143518516" "42286.344155092593" "42286.344166666669" "42286.344178240739" "42286.344189814816" "42286.344201388885" "42286.344212962962" "42286.344224537039"
# [11] "42286.344236111108" "42286.344247685185" "42286.344259259262" "42286.344270833331" "42286.344282407408" "42286.344293981485" "42286.344305555554" "42286.344317129631" "42286.344340277778" "42286.344351851854"
当我尝试将字符数据类型转换为数字并返回日期时,我会丢失日期时间并且还会得到重复项,我认为这是由于精度损失或精度过高造成的。
as.POSIXct(as.numeric(dfCTS$DateTime) * (60*60*24), origin = "1899-12-30", tz = "GMT")[1:20]
# [1] "2015-10-09 08:15:32 GMT" "2015-10-09 08:15:33 GMT" "2015-10-09 08:15:34 GMT" "2015-10-09 08:15:35 GMT" "2015-10-09 08:15:36 GMT""2015-10-09 08:15:37 GMT" "2015-10-09 08:15:38 GMT" "2015-10-09 08:15:38 GMT"
# [9] "2015-10-09 08:15:40 GMT" "2015-10-09 08:15:41 GMT" "2015-10-09 08:15:41 GMT" "2015-10-09 08:15:43 GMT" "2015-10-09 08:15:44 GMT" "2015-10-09 08:15:45 GMT" "2015-10-09 08:15:46 GMT" "2015-10-09 08:15:47 GMT"
# [17] "2015-10-09 08:15:48 GMT" "2015-10-09 08:15:49 GMT" "2015-10-09 08:15:51 GMT" "2015-10-09 08:15:52 GMT"
提前感谢您的帮助。
答案 0 :(得分:0)
我不确定您是否有理由进行所有这些中间转换,但如果您想直接从文件中将数据读入POSIXct列,则可以选择以下两种方法。
选项1 :使用as.POSIXct()
设置自定义类。只需将text = txt
替换为下面read.table()
中的文件名。
setClass("myPOSIX")
setAs(
"character",
"myPOSIX",
function(from) as.POSIXct(from, format = "%m/%d/%Y %T", tz = "GMT")
)
df <- read.table(text = txt, sep = "\n", colClasses = "myPOSIX")
class(df[[1]])
# [1] "POSIXct" "POSIXt"
选项2:使用readr
包。这是我第一次使用这个软件包,所以希望我能以正确的方式完成它。
library(readr)
df <- read_csv(
txt,
col_names = "date",
col_types = cols(date = col_datetime(format = "%m/%d/%Y %T"))
)
class(df$date)
# [1] "POSIXct" "POSIXt"
数据:
txt <- "10/09/2015 08:15:32
10/09/2015 08:15:33
10/09/2015 08:15:34
10/09/2015 08:15:35
10/09/2015 08:15:36
10/09/2015 08:15:37
10/09/2015 08:15:38
10/09/2015 08:15:39
10/09/2015 08:15:40
10/09/2015 08:15:41
10/09/2015 08:15:42
10/09/2015 08:15:43
10/09/2015 08:15:44
10/09/2015 08:15:45
10/09/2015 08:15:46
10/09/2015 08:15:47
10/09/2015 08:15:48
10/09/2015 08:15:49
10/09/2015 08:15:51
10/09/2015 08:15:52"