我有奇怪的格式化日期和时间数据,需要计算R中的差异。我们将非常感谢您的帮助。谢谢。
TimeStart TimeEnd
May 1 2016 1:00AM May 1 2016 1:28AM
May 1 2016 1:01AM May 1 2016 1:21AM
May 1 2016 1:00PM May 1 2016 1:13PM
May 1 2016 1:00PM May 4 2016 5:42PM
May 1 2016 1:02PM May 1 2016 1:37PM
May 1 2016 1:02PM May 1 2016 1:14PM
May 1 2016 1:02PM May 1 2016 1:39PM
May 1 2016 1:02PM May 1 2016 1:18PM
答案 0 :(得分:0)
查看?strptime
以了解如何设置日期/时间对象的格式。
library(data.table)
dat <- read.table(text = "May 1 2016 1:00AM May 1 2016 1:28AM
May 1 2016 1:01AM May 1 2016 1:21AM
May 1 2016 1:00PM May 1 2016 1:13PM
May 1 2016 1:00PM May 4 2016 5:42PM
May 1 2016 1:02PM May 1 2016 1:37PM
May 1 2016 1:02PM May 1 2016 1:14PM
May 1 2016 1:02PM May 1 2016 1:39PM
May 1 2016 1:02PM May 1 2016 1:18PM")
dat2 <- setDT(dat)[ , list(start = paste(V1, V2, V3, V4),
end = paste(V5, V6, V7, V8))]
dat2[] <- lapply(dat2, as.POSIXct, format = "%B %d %Y %H:%M%p")
dat2[ , diff := end - start]
dat2
# start end diff
# 1: 2016-05-01 01:00:00 2016-05-01 01:28:00 28 mins
# 2: 2016-05-01 01:01:00 2016-05-01 01:21:00 20 mins
# 3: 2016-05-01 01:00:00 2016-05-01 01:13:00 13 mins
# 4: 2016-05-01 01:00:00 2016-05-04 05:42:00 4602 mins
# 5: 2016-05-01 01:02:00 2016-05-01 01:37:00 35 mins
# 6: 2016-05-01 01:02:00 2016-05-01 01:14:00 12 mins
# 7: 2016-05-01 01:02:00 2016-05-01 01:39:00 37 mins
# 8: 2016-05-01 01:02:00 2016-05-01 01:18:00 16 mins
答案 1 :(得分:0)
在dplyr中,
library(dplyr)
# parse datetimes
df %>% mutate_all(as.POSIXct, format = '%b %d %Y %I:%M%p') %>%
# add column with time difference
mutate(elapsed = TimeEnd - TimeStart)
## TimeStart TimeEnd elapsed
## 1 2016-05-01 01:00:00 2016-05-01 01:28:00 28 mins
## 2 2016-05-01 01:01:00 2016-05-01 01:21:00 20 mins
## 3 2016-05-01 13:00:00 2016-05-01 13:13:00 13 mins
## 4 2016-05-01 13:00:00 2016-05-04 17:42:00 4602 mins
## 5 2016-05-01 13:02:00 2016-05-01 13:37:00 35 mins
## 6 2016-05-01 13:02:00 2016-05-01 13:14:00 12 mins
## 7 2016-05-01 13:02:00 2016-05-01 13:39:00 37 mins
## 8 2016-05-01 13:02:00 2016-05-01 13:18:00 16 mins
或等效于基础R,
df$TimeStart <- as.POSIXct(df$TimeStart, format = '%b %d %Y %I:%M%p')
df$TimeEnd <- as.POSIXct(df$TimeEnd, format = '%b %d %Y %I:%M%p')
df$elapsed <- df$TimeEnd - df$TimeStart
df
## TimeStart TimeEnd elapsed
## 1 2016-05-01 01:00:00 2016-05-01 01:28:00 28 mins
## 2 2016-05-01 01:01:00 2016-05-01 01:21:00 20 mins
## 3 2016-05-01 13:00:00 2016-05-01 13:13:00 13 mins
## 4 2016-05-01 13:00:00 2016-05-04 17:42:00 4602 mins
## 5 2016-05-01 13:02:00 2016-05-01 13:37:00 35 mins
## 6 2016-05-01 13:02:00 2016-05-01 13:14:00 12 mins
## 7 2016-05-01 13:02:00 2016-05-01 13:39:00 37 mins
## 8 2016-05-01 13:02:00 2016-05-01 13:18:00 16 mins
df <- structure(list(TimeStart = c("May 1 2016 1:00AM", "May 1 2016 1:01AM",
"May 1 2016 1:00PM", "May 1 2016 1:00PM", "May 1 2016 1:02PM",
"May 1 2016 1:02PM", "May 1 2016 1:02PM", "May 1 2016 1:02PM"
), TimeEnd = c("May 1 2016 1:28AM", "May 1 2016 1:21AM", "May 1 2016 1:13PM",
"May 4 2016 5:42PM", "May 1 2016 1:37PM", "May 1 2016 1:14PM",
"May 1 2016 1:39PM", "May 1 2016 1:18PM")), class = "data.frame", row.names = c(NA,
-8L), .Names = c("TimeStart", "TimeEnd"))
答案 2 :(得分:0)
我更喜欢使用lubridate来做这样的事情。它是一个简单的包,可以使用一致的命名方案来解析日期时间。
library(lubridate)
首先使用mdy_hm
df2 <- apply(df, 2, mdy_hm)
然后计算持续时间中的秒数。如果有足够的秒数,它会自动告诉你多少分钟。
dseconds(df2[,2]-df2[,1])
结果如下所示
[1] "1680s (~28 minutes)" "1200s (~20 minutes)"
[3] "780s (~13 minutes)" "276120s (~4602 minutes)"
[5] "2100s (~35 minutes)" "720s (~12 minutes)"
[7] "2220s (~37 minutes)" "960s (~16 minutes)"