如何计算每天每个唯一ID在位置上花费的时间?

时间:2020-05-26 13:54:04

标签: r time-series

我目前正在研究smartGPA数据集(https://studentlife.cs.dartmouth.edu/smartgpa.pdf),并试图计算每个学生每天在某个位置学习所需的时间(UID),以便最终获得每个学生的平均学习时间学生。我有每个学生基于wifi位置数据的时间戳数据和位置数据。例如,在数据中您可以看到学生从6:55:54到7:05:34在图书馆度过的时间,那么我想减去这些时间以获得学习所花费的时间。

我创建了一个额外的列,该列减去x + 1行-x = 2行之间的时间差。如果这些行之间的差异大于15分钟,那么我需要停止总结时间差异,并重新开始一个新的研究实例。有没有简单的方法可以做到这一点?例如,我将对第1行到第8行的timediff求和,并丢弃第9、10、11行,因为它们的时间差大于15分钟。

enter image description here

structure(list(timestamp = c(1364385354L, 1364385374L, 1364385384L, 
1364385454L, 1364385763L, 1364385856L, 1364385868L, 1364385934L, 
1364392663L, 1364392681L, 1364397495L, 1364397505L, 1364397923L, 
1364411988L, 1364412078L, 1364412163L, 1364412406L, 1364412453L, 
1364412968L, 1364413005L), location = c("in[baker-berry]", "in[baker-berry]", 
"in[baker-berry]", "in[baker-berry]", "in[baker-berry]", "in[baker-berry]", 
"in[baker-berry]", "in[baker-berry]", "in[dana-library]", "in[dana-library]", 
"in[dana-library]", "in[dana-library]", "in[baker-berry]", "in[baker-berry]", 
"in[baker-berry]", "in[baker-berry]", "in[baker-berry]", "in[baker-berry]", 
"in[baker-berry]", "in[baker-berry]"), uid = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("0", "1", "2", "3", "4", "5", "6", "7", 
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", 
"19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", 
"30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", 
"41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", 
"52", "53", "54", "55", "56", "57", "58", "59"), class = "factor"), 
    hour = c(12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 
    16L, 16L, 16L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), epoch = structure(c(2L, 
    2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L), .Label = c("nig", "mor", "aft", "eve"), class = "factor"), 
    day = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
    3, 3, 3), week = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1), weekday = structure(c(3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L), .Label = c("mon", "tue", "wed", "thu", "fri", "sat", 
    "sun"), class = "factor"), time = structure(c(1364385354, 
    1364385374, 1364385384, 1364385454, 1364385763, 1364385856, 
    1364385868, 1364385934, 1364392663, 1364392681, 1364397495, 
    1364397505, 1364397923, 1364411988, 1364412078, 1364412163, 
    1364412406, 1364412453, 1364412968, 1364413005), tzone = "EST", class = c("POSIXct", 
    "POSIXt"))), row.names = c(NA, -20L), class = c("timestamp_SL_tbl", 
"SL_tbl", "tbl_df", "tbl", "data.frame"), schema = "sensing", table = "wifi_location")

2 个答案:

答案 0 :(得分:0)

这可以通过 dplyr软件包 as.POSIXct()函数完成。因此,如果尚未下载dplyr软件包,请先下载然后再加载。

install.packages("dplyr") 
library(dplyr)

然后,您将需要创建一个具有此时差的新列。 另外,最好不要更改原始数据。

可以这样做:

data.frame <- data.frame %>%
> mutate(time_difference = as.numeric(as.POSIXct(time1)) - as.numeric(as.POSIXct(time2)))

as.numeric()函数将在几秒钟内将其转换,这将创建一个新列,其差异也在几秒钟之内。

现在,如果您希望以分钟为单位的差异,则只需将结果除以60(每分钟有60秒)。如果您希望以小时为单位,则只需将结果除以360(一小时内有360秒)。

比方说您希望按分钟显示此差异。

届时,您可以创建一个新列

data.frame <- data.frame %>%
> mutate(time_difference_seconds = time_difference /60)

或者您可以通过将分隔添加到原始列中来使其难看。

data.frame <- data.frame %>%
> mutate(time_difference = (as.numeric(as.POSIXct(time1)) - as.numeric(as.POSIXct(time2)))) / 60

希望这会有所帮助!

答案 1 :(得分:0)

我通过考虑问题描述中提到的连续两行之间的时间差,然后删除所有超过15分钟的观察值来解决此问题。然后,我对每个学生进行求和,得出每个学生的总学习时间。