R as.POSIXct() dropping hours minutes and seconds

时间:2015-05-04 19:46:28

标签: r posixct

I am experimenting with R to analyse some measurement data. I have a .csv file containing more than 2 million lines of measurement. Here is an example:

2014-10-22 21:07:03+00:00,7432442.0
2014-10-22 21:07:21+00:00,7432443.0
2014-10-22 21:07:39+00:00,7432444.0
2014-10-22 21:07:57+00:00,7432445.0
2014-10-22 21:08:15+00:00,7432446.0
2014-10-22 21:08:33+00:00,7432447.0
2014-10-22 21:08:52+00:00,7432448.0
2014-10-22 21:09:10+00:00,7432449.0
2014-10-22 21:09:28+00:00,7432450.0

After reading in the file, I want to convert the time to a correct time, using as.POSIXct(). For small files this works fine, but for large files it does not.

I made an example by reading in a big file, creating a copy of a small portion and then unleashing the as.POSIXct() on the correct column. I included an image of the file. As you can see, when applying it to the temp-variable it does correctl keep the hours, minutes and seconds. However, when applying it to the whole file, only the date is stored. (it also takes a LOT of time (more than 2 minutes))

POSIXct() error

What could cause this anomality? Is it due to some system limits, since I'm running this on my laptop.

Edit

On my Windows 7 device I run R 3.1.3 which results in this error. However, on Ubuntu 14.01, running R 3.0.2, the times are kept for the large files. Just noticed there is a newer version (3.2.0) for Windows, will update and check if the issue persists.

3 个答案:

答案 0 :(得分:7)

也许你的问题的原因是你的数据集中的某个地方没有时间。请尝试以下示例:

  library(lubridate)
  dates <- as.character(now() + minutes(1:5))
  dates <- c(dates,"2015-05-10")
  as.POSIXct(dates[1:5])
  as.POSIXct(dates)

首先创建一个包含6个日期和时间的向量dates并将它们转换为字符。然后我添加另一个不包含时间的日期(作为一个字符)。当您将两次转化运行到POSIXct时,只要您在没有时间的情况下包含日期,您就会注意到结果中的时间已经消失。

因此,在数据的前几行中似乎没有时间没有时间,但稍后可能会有。对于这个问题,很可能有很多解决方案,我只想提出一个我想到的问题。

第一步是更改您的读取命令,以便将日期存储为字符而不是因子:

data <- read.csv("C:/RData/house2_electricity_Main.csv",header=FALSE,stringsAsFactors=FALSE)

然后你可以尝试将时间添加到所有没有的日期,然后再转换为POSIXct:

data$V1 <- ifelse(nchar(data$V1) > 11,data$V1, paste0(data$V1,"00:00:00"))
data$V1 <- as.POSIXct(data$V1)

这适用于我上面的小例子。它不是最优雅的解决方案,也许有人有更好的想法。

答案 1 :(得分:3)

您可以尝试以下代码 它会:

  • 将日期时间类型读为字符而非因素
  • 按引用更新

library(data.table)
data <- fread("C:/RData/house2_electricity_main.csv")
data[, V1 := as.POSIXct(V1)]

最近有一个关于使用fasttime::fastPOSIXct代替as.POSIXct的问题,这可能会加快速度。

至于标题问题,使用POSIXct可以非常自由地对其进行舍入,例如:函数yearmonthmday ...

data[, .SD, by = .(year(V1),month(V1),mday(V1))]

答案 2 :(得分:2)

我在as.POSIXlt(X)丢弃hour:minute:second信息时遇到了类似的问题,其中XPOSIXct个对象的向量,碰巧有tzone="UTC"

但是,as.POSIXlt(X, tz="UTC")保留了hour:minute:second信息。