使用lubridate并从第1个条目中减去列中的条目

时间:2015-10-02 18:44:47

标签: r lubridate

我的数据看起来像

Dates                   another column
2015-05-13 23:53:00     some values
2015-05-13 23:53:00     ....
2015-05-13 23:33:00
2015-05-13 23:30:00
...
2003-01-06 00:01:00
2003-01-06 00:01:00

我之后使用的代码是

trainDF<-read.csv("train.csv") 
diff<-as.POSIXct(trainDF[1,1])-as.POSIXct(trainDF[,1])
head(diff)
Time differences in hours
[1] 23.88333 23.88333 23.88333 23.88333 23.88333 23.88333

但是,这没有意义,因为减去前两个条目应该给0,因为它们是完全相同的时间。从第1个减去第3个条目应该给出20分钟的差异,而不是23.88333小时。当我尝试as.duration(diff)as.numeric(diff)时,我得到了相似的值。这是为什么?

1 个答案:

答案 0 :(得分:0)

如果您在POSIXct中只有一系列日期,则可以使用diff函数计算每个日期之间的差异。这是一个例子:

> BD <- as.POSIXct("2015-01-01 12:00:00", tz = "UTC") # Making a begin date.
> ED <- as.POSIXct("2015-01-01 13:00:00", tz = "UTC") # Making an end date.
> timeSeq <- seq(BD, ED, "min") # Creating a time series in between the dates by minute.
> 
> head(timeSeq) # To see what it looks like.
[1] "2015-01-01 12:00:00 UTC" "2015-01-01 12:01:00 UTC" "2015-01-01 12:02:00 UTC" "2015-01-01 12:03:00 UTC" "2015-01-01 12:04:00 UTC"
[6] "2015-01-01 12:05:00 UTC"
> 
> diffTime <- diff(timeSeq) # Takes the difference between each adjacent time in the time series.
> print(diffTime) # Printing out the result.
Time differences in mins
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> 
> # For the sake of example, let's make a hole in the data.
> 
> limBD <- as.POSIXct("2015-01-01 12:15:00", tz = "UTC") # Start of the hole we want to create. 
> limED <- as.POSIXct("2015-01-01 12:45:00", tz = "UTC") # End of the hole we want to create.
> 
> timeSeqLim <- timeSeq[timeSeq <= limBD | timeSeq >= limED] # Make a hole of 1/2 hour in the sequence.
> 
> diffTimeLim <- diff(timeSeqLim) # Taking the diff.
> print(diffTimeLim) # There is now a large gap, which is reflected in the print out.
Time differences in mins
 [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 30  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1

但是,我再次阅读了你的帖子,似乎你只想减去第一行不在第一行的每个项目。我使用了上面使用的相同样本来执行此操作:

Time difference of 1 mins
> timeSeq[1] - timeSeq[2:length(timeSeq)]
Time differences in mins
 [1]  -1  -2  -3  -4  -5  -6  -7  -8  -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36
[37] -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60

这给了我期待的东西。尝试使用data.frame方法:

> timeDF <- data.frame(time = timeSeq)
> timeDF[1,1] - timeDF[, 1]
Time differences in secs
 [1]     0   -60  -120  -180  -240  -300  -360  -420  -480  -540  -600  -660  -720  -780  -840  -900  -960 -1020 -1080 -1140 -1200 -1260 -1320 -1380
[25] -1440 -1500 -1560 -1620 -1680 -1740 -1800 -1860 -1920 -1980 -2040 -2100 -2160 -2220 -2280 -2340 -2400 -2460 -2520 -2580 -2640 -2700 -2760 -2820
[49] -2880 -2940 -3000 -3060 -3120 -3180 -3240 -3300 -3360 -3420 -3480 -3540 -3600

我似乎没有遇到和你一样的问题。或许首先将所有内容强制转换为POSIX.ct,然后进行减法操作?尝试检查数据类,确保它实际上在POSIXct中。检查您正在减去的实际值,这可能会给您一些见解。

编辑:

下载文件后,这就是我的运行方式。该文件是trainDF:

trainDF$Dates <- as.POSIXct(trainDF$Dates, tz = "UTC") # Coercing to POSIXct.
datesDiff <- trainDF[1, 1] - trainDF[, 1] # Taking the difference of each date with the first date.
head(datesDiff) # Printing out the head.

结果:

Time differences in secs
[1]    0    0 1200 1380 1380 1380

我做的唯一不同的事情就是使用时区UTC,它不会在夏令时间之间转换小时数,因此应该没有效果。

然而,我做了与你完全相同的方法并得到了相同的结果:

> diff<-as.POSIXct(trainDF[1,1])-as.POSIXct(trainDF[,1])
> head(diff)
Time differences in hours
[1] 23.88333 23.88333 23.88333 23.88333 23.88333 23.88333

所以你的方法有一些东西,但我不能说什么。我发现通常在一行中进行强制然后进行一些数学运算而不是一起进行更安全。