我正在寻找一种计算每组ID时差的方法。以下是我的数据的一部分:
ID road beginTime endTime Mon Tue Wed Thu Fri Sat
666 757 9:00 AM 11:45 AM S
555 758 1:55 PM 3:45 PM M W
555 759 10:40 AM 12:30 PM M W
555 760 4:00 PM 5:50 PM Tue R
444 761 3:00 PM 4:25 PM Tue R
444 762 4:30 PM 7:15 PM M
444 763 12:50 PM 2:40 PM Fri
444 764 10:40 AM 11:35 AM Tue R
222 765 11:45 AM 2:30 PM M W
222 766 6:00 PM 9:40 PM R
333 767 8:30 AM 11:15 AM M W
333 768 8:30 AM 11:15 AM Tue R
333 769 1:25 PM 2:50 PM Tue R
333 770 11:45 AM 1:10 PM M W
输出dput():
structure(list(ID = c(666L, 555L, 555L, 555L, 444L, 444L, 444L,
444L, 222L, 222L, 333L, 333L, 333L, 333L), road = 757:770, beginTime = structure(c(11L,
2L, 3L, 7L, 6L, 8L, 5L, 3L, 4L, 9L, 10L, 10L, 1L, 4L), .Label = c("1:25 PM",
"1:55 PM", "10:40 AM", "11:45 AM", "12:50 PM", "3:00 PM", "4:00 PM",
"4:30 PM", "6:00 PM", "8:30 AM", "9:00 AM"), class = "factor"),
endTime = structure(c(4L, 9L, 5L, 11L, 10L, 12L, 7L, 3L,
6L, 13L, 2L, 2L, 8L, 1L), .Label = c("1:10 PM", "11:15 AM",
"11:35 AM", "11:45 AM", "12:30 PM", "2:30 PM", "2:40 PM",
"2:50 PM", "3:45 PM", "4:25 PM", "5:50 PM", "7:15 PM", "9:40 PM"
), class = "factor"), Mon = structure(c(1L, 2L, 2L, 1L, 1L,
2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L), .Label = c("", "M"), class = "factor"),
Tue = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L,
1L, 2L, 2L, 1L), .Label = c("", "Tue"), class = "factor"),
Wed = structure(c(1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
2L, 1L, 1L, 2L), .Label = c("", "W"), class = "factor"),
Thu = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L), .Label = c("", "R"), class = "factor"),
Fri = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("", "Fri"), class = "factor"),
Sat = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = c("", "S"), class = "factor")), .Names = c("ID",
"road", "beginTime", "endTime", "Mon", "Tue", "Wed", "Thu", "Fri",
"Sat"), class = "data.frame", row.names = c(NA, -14L))
每个ID在当天的不同时间(开始时间,结束时间)在不同的道路(道路)上行驶。我想为每个ID计算等待(非驾驶)时间。例如,ID = 555在周一和周三开车。第一期是上午10:40至下午12:30。它等待了1.41小时,然后在1:55 - 3:45之间开始了另一段时间。等待时间为1.41小时是我需要的。星期二和星期四,这个身份证明还有另一个等待时间。对于ID = 666,它仅在星期六开车一段时间,因此等待时间为0.我的数据的难度在于每个ID每天都有不同的时段。有什么建议?非常感谢!
答案 0 :(得分:0)
使用我在评论中提到的“长”格式会让事情变得更容易。
首先,我会稍微清理一下您的数据:将因子转换为字符串,然后将字符串转换为字符串(df
将数据转换为上面的dput
):
library(dplyr)
# small helper function
astime <- function(x) as.POSIXct(x, format = "%I:%M %p")
df2 <- df %>%
mutate_each(funs(as.character), beginTime:Sat) %>%
mutate_each(funs(astime), beginTime, endTime)
head(df2)
# ID road beginTime endTime Mon Tue Wed Thu Fri Sat
# 1 666 757 2016-06-21 09:00:00 2016-06-21 11:45:00 S
# 2 555 758 2016-06-21 13:55:00 2016-06-21 15:45:00 M W
# 3 555 759 2016-06-21 10:40:00 2016-06-21 12:30:00 M W
# 4 555 760 2016-06-21 16:00:00 2016-06-21 17:50:00 Tue R
# 5 444 761 2016-06-21 15:00:00 2016-06-21 16:25:00 Tue R
# 6 444 762 2016-06-21 16:30:00 2016-06-21 19:15:00 M
(不要担心日期都是错误的,应该被忽略。)现在我将从宽转换为长并删除那些天是空字符串的实例:
library(tidyr)
df3 <- df2 %>%
gather(day, ign, Mon:Sat) %>%
filter(ign != "") %>%
select(-ign)
head(df3)
# ID road beginTime endTime day
# 1 555 758 2016-06-21 13:55:00 2016-06-21 15:45:00 Mon
# 2 555 759 2016-06-21 10:40:00 2016-06-21 12:30:00 Mon
# 3 444 762 2016-06-21 16:30:00 2016-06-21 19:15:00 Mon
# 4 222 765 2016-06-21 11:45:00 2016-06-21 14:30:00 Mon
# 5 333 767 2016-06-21 08:30:00 2016-06-21 11:15:00 Mon
# 6 333 770 2016-06-21 11:45:00 2016-06-21 13:10:00 Mon
现在我将它们分组并计算等待的时间:
df4 <- df3 %>%
arrange(ID, day, beginTime) %>%
group_by(ID, day) %>%
mutate(
waitTime = difftime(beginTime, dplyr::lag(endTime, default = beginTime[1]), units='secs')
)
head(df4)
# Source: local data frame [6 x 6]
# Groups: ID, day [5]
# ID road beginTime endTime day waitTime
# <int> <int> <time> <time> <chr> <S3: difftime>
# 1 222 765 2016-06-21 11:45:00 2016-06-21 14:30:00 Mon 0 secs
# 2 222 766 2016-06-21 18:00:00 2016-06-21 21:40:00 Thu 0 secs
# 3 222 765 2016-06-21 11:45:00 2016-06-21 14:30:00 Wed 0 secs
# 4 333 767 2016-06-21 08:30:00 2016-06-21 11:15:00 Mon 0 secs
# 5 333 770 2016-06-21 11:45:00 2016-06-21 13:10:00 Mon 1800 secs
# 6 333 768 2016-06-21 08:30:00 2016-06-21 11:15:00 Thu 0 secs
您可以轻松过滤某些人等待的时间:
df4 %>%
filter(waitTime > 0)
# Source: local data frame [8 x 6]
# Groups: ID, day [8]
# ID road beginTime endTime day waitTime
# <int> <int> <time> <time> <chr> <S3: difftime>
# 1 333 770 2016-06-21 11:45:00 2016-06-21 13:10:00 Mon 1800 secs
# 2 333 769 2016-06-21 13:25:00 2016-06-21 14:50:00 Thu 7800 secs
# 3 333 769 2016-06-21 13:25:00 2016-06-21 14:50:00 Tue 7800 secs
# 4 333 770 2016-06-21 11:45:00 2016-06-21 13:10:00 Wed 1800 secs
# 5 444 761 2016-06-21 15:00:00 2016-06-21 16:25:00 Thu 12300 secs
# 6 444 761 2016-06-21 15:00:00 2016-06-21 16:25:00 Tue 12300 secs
# 7 555 758 2016-06-21 13:55:00 2016-06-21 15:45:00 Mon 5100 secs
# 8 555 758 2016-06-21 13:55:00 2016-06-21 15:45:00 Wed 5100 secs
在这种情况下,您会看到周一和周三的ID 555示例有一个1.41小时(5100秒)的中断,而ID 666没有等待时间。