我有一个由位置跟踪系统的监视器读数组成的数据集。不幸的是,我没有足够的技巧来随机复制它,所以这里是最初的几条记录:
Time TagID MonitorID Location
2017-10-31 23:03:26 1427435 1352303 A4.18
2017-10-31 23:06:02 1427435 1352303 A4.18
2017-10-31 23:06:20 1427435 1352303 A4.18
2017-10-31 23:06:50 1427435 1352303 A4.18
2017-10-31 23:06:51 1427435 1352303 A4.18
2017-10-31 23:07:20 1427435 1352303 A4.18
.
.
.
2017-11-22 22:29:55 1427435 1349044 B6.24
2017-11-22 22:30:22 1427435 286748 B6.41
2017-11-22 22:30:25 1427435 1349044 B6.24
2017-11-22 22:30:40 1427435 286748 B6.41
2017-11-22 22:30:41 1427435 286748 B6.41
2017-11-22 22:30:55 1427435 1349044 B6.24
我试图通过查看在MonitorID读数发生变化之前经过了多长时间来确定RFID标签在特定监视器位置花费的时间。我通过我写的这个函数来做到这一点:
elapsed_time <- function(x) {
# Prepare variables
current_monitor <- x$MonitorID[1]
start_time <- x$Time[1]
end_time <- NULL
output <- data.frame("Date" = as.POSIXct(as.character()), "MonitorID" = as.integer(),
"Minutes_elapsed" = as.integer())
# For loop to iterate over rows
for (i in 1:nrow(x)) {
# if the new monitor is the same as the old one then go to next iteration
# otherwise calculate the time between dates, add values to output
if (x$MonitorID[i] == current_monitor & i != nrow(x)) {
next
} else {
# Mark what the time is when the location changes
end_time <- x$Time[i]
# Calculate time difference
time_spent <- difftime(end_time, start_time, units = "mins")
# Create temporary data frame to append to output
temp <- data.frame(start_time, current_monitor, time_spent)
# Append temp to output
output <- rbind(output, setNames(temp, names(output)))
# Set the new start time to the current time
start_time <- end_time
# Set the current monitor tracker to the new monitor
current_monitor <- x$MonitorID[i]
}
}
# Add monitor mappings to output
output <- left_join(output, Mmappings[,c(1,2)], by="MonitorID")
return(output)
}
可以忽略最后一行,它只是将实际位置名重新映射到MonitorID读数。此功能可以根据需要运行,但是只需要一个显示器(约4分钟)运行需要很长时间,我想在另一个功能中同时使用大约95个显示器。我确信有一种更有效的方法来编写这个函数来减少所花费的时间。
编辑:这是一个请求的示例输出:
Date MonitorID Minutes_elapsed Location
1 2017-10-31 23:03:26 1352303 3.36666667 mins A4.18
2 2017-10-31 23:06:48 0 0.03333333 mins A4.20
3 2017-10-31 23:06:50 1352303 0.45000000 mins A4.18
4 2017-10-31 23:07:17 0 0.05000000 mins A4.20
5 2017-10-31 23:07:20 1352303 0.45000000 mins A4.18
6 2017-10-31 23:07:47 0 0.05000000 mins A4.20
在这种情况下,更改之间的时间很短,因为有时读数会反弹到其他显示器,但这不相关。
答案 0 :(得分:0)
我将尝试建立一个示例数据框
df1<-data.frame(Time=c("2017-10-31 23:03:26","2017-10-31 23:06:02","2017-10-31 23:06:20","2017-10-31 23:06:50","2017-10-31 23:06:51",
"2017-10-31 23:07:20"),TagID=c(1427435,1427435,1427435,1427435,1427435,1427435),
MonitorID=c(1352303,1352303,1352303,1352303,1352303,1352303),Location=c("A4.18","A4.18","A4.18","A4.18","A4.18","A4.18"))
df1$Time<-ymd_hms(df1$Time)
df2<-df1
df2$Time=df2$Time+minutes(30)
df2$MonitorID=df2$MonitorID+1
df2$Location<-"A4.19"
df<-rbind(df1,df2)
因此,如果您的数据框与上述类似,您可以使用以下代码计算每个Monitor ID的已用时间(以分钟为单位):
result<-df%>%group_by(MonitorID)%>%summarize(ElapsedTime=difftime(tail(Time,1),head(Time,1)))
答案 1 :(得分:0)
这有帮助吗?
library(tidyverse) # for easy data manipulation
library(lubridate) # for dealing with dates
# create the sample data
myDf <- frame_data(
~Time, ~TagID, ~MonitorID, ~Location,
"2017-10-31 23:03:26", 1427435, 1352303, "A4.18",
"2017-10-31 23:06:02", 1427435, 1352303, "A4.18",
"2017-10-31 23:06:20", 1427435, 1352303, "A4.18",
"2017-10-31 23:06:50", 1427435, 1352303, "A4.18",
"2017-10-31 23:06:51", 1427435, 1352303, "A4.18",
"2017-10-31 23:07:20", 1427435, 1352303, "A4.18",
"2017-11-22 22:29:55", 1427435, 1349044, "B6.24",
"2017-11-22 22:30:22", 1427435, 286748, "B6.41",
"2017-11-22 22:30:25", 1427435, 1349044, "B6.24",
"2017-11-22 22:30:40", 1427435, 286748, "B6.41",
"2017-11-22 22:30:41", 1427435, 286748, "B6.41",
"2017-11-22 22:30:55", 1427435, 1349044, "B6.24"
)
# make times times
# and (important!) sort the dataframe
myDf <- myDf %>%
mutate(Time = as_datetime(Time)) %>%
arrange(TagID, Time)
myDf %>%
mutate(priorIDtheSame = MonitorID == lag(MonitorID)) %>%
mutate(priorIDtheSame = replace(priorIDtheSame, is.na(priorIDtheSame), FALSE)) %>%
mutate(nextIDtheSame = MonitorID == lead(MonitorID)) %>%
mutate(nextIDtheSame = replace(nextIDtheSame, is.na(nextIDtheSame), FALSE)) %>%
# we simply remove all the rows inbetween first and last at one location
filter(!(priorIDtheSame & nextIDtheSame)) %>%
# calculate the time difference
mutate(timeAtThisLocation = Time - lag(Time)) %>%
# and make sure it is only calculated were we need it
mutate(timeAtThisLocation = replace(timeAtThisLocation, !priorIDtheSame, NA))
导致
# A tibble: 8 x 7
Time TagID MonitorID Location priorIDtheSame nextIDtheSame timeAtThisLocation
<dttm> <dbl> <dbl> <chr> <lgl> <lgl> <time>
1 2017-10-31 22:03:26 1427435 1352303 A4.18 FALSE TRUE NA secs
2 2017-10-31 22:07:20 1427435 1352303 A4.18 TRUE FALSE 234 secs
3 2017-11-22 21:29:55 1427435 1349044 B6.24 FALSE FALSE NA secs
4 2017-11-22 21:30:22 1427435 286748 B6.41 FALSE FALSE NA secs
5 2017-11-22 21:30:25 1427435 1349044 B6.24 FALSE FALSE NA secs
6 2017-11-22 21:30:40 1427435 286748 B6.41 FALSE TRUE NA secs
7 2017-11-22 21:30:41 1427435 286748 B6.41 TRUE FALSE 1 secs
8 2017-11-22 21:30:55 1427435 1349044 B6.24 FALSE FALSE NA secs