我目前正在研究smartGPA数据集(https://studentlife.cs.dartmouth.edu/smartgpa.pdf),并试图计算每个学生每天在某个位置学习所需的时间(UID),以便最终获得每个学生的平均学习时间学生。我有每个学生基于wifi位置数据的时间戳数据和位置数据。例如,在数据中您可以看到学生从6:55:54到7:05:34在图书馆度过的时间,那么我想减去这些时间以获得学习所花费的时间。
我创建了一个额外的列,该列减去x + 1行-x = 2行之间的时间差。如果这些行之间的差异大于15分钟,那么我需要停止总结时间差异,并重新开始一个新的研究实例。有没有简单的方法可以做到这一点?例如,我将对第1行到第8行的timediff求和,并丢弃第9、10、11行,因为它们的时间差大于15分钟。
structure(list(timestamp = c(1364385354L, 1364385374L, 1364385384L,
1364385454L, 1364385763L, 1364385856L, 1364385868L, 1364385934L,
1364392663L, 1364392681L, 1364397495L, 1364397505L, 1364397923L,
1364411988L, 1364412078L, 1364412163L, 1364412406L, 1364412453L,
1364412968L, 1364413005L), location = c("in[baker-berry]", "in[baker-berry]",
"in[baker-berry]", "in[baker-berry]", "in[baker-berry]", "in[baker-berry]",
"in[baker-berry]", "in[baker-berry]", "in[dana-library]", "in[dana-library]",
"in[dana-library]", "in[dana-library]", "in[baker-berry]", "in[baker-berry]",
"in[baker-berry]", "in[baker-berry]", "in[baker-berry]", "in[baker-berry]",
"in[baker-berry]", "in[baker-berry]"), uid = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("0", "1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
"19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
"30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40",
"41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51",
"52", "53", "54", "55", "56", "57", "58", "59"), class = "factor"),
hour = c(12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L,
16L, 16L, 16L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), epoch = structure(c(2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L), .Label = c("nig", "mor", "aft", "eve"), class = "factor"),
day = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3), week = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), weekday = structure(c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("mon", "tue", "wed", "thu", "fri", "sat",
"sun"), class = "factor"), time = structure(c(1364385354,
1364385374, 1364385384, 1364385454, 1364385763, 1364385856,
1364385868, 1364385934, 1364392663, 1364392681, 1364397495,
1364397505, 1364397923, 1364411988, 1364412078, 1364412163,
1364412406, 1364412453, 1364412968, 1364413005), tzone = "EST", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -20L), class = c("timestamp_SL_tbl",
"SL_tbl", "tbl_df", "tbl", "data.frame"), schema = "sensing", table = "wifi_location")
答案 0 :(得分:0)
这可以通过 dplyr软件包和 as.POSIXct()函数完成。因此,如果尚未下载dplyr软件包,请先下载然后再加载。
install.packages("dplyr")
library(dplyr)
然后,您将需要创建一个具有此时差的新列。 另外,最好不要更改原始数据。
可以这样做:
data.frame <- data.frame %>%
> mutate(time_difference = as.numeric(as.POSIXct(time1)) - as.numeric(as.POSIXct(time2)))
as.numeric()函数将在几秒钟内将其转换,这将创建一个新列,其差异也在几秒钟之内。
现在,如果您希望以分钟为单位的差异,则只需将结果除以60(每分钟有60秒)。如果您希望以小时为单位,则只需将结果除以360(一小时内有360秒)。
比方说您希望按分钟显示此差异。
届时,您可以创建一个新列
data.frame <- data.frame %>%
> mutate(time_difference_seconds = time_difference /60)
或者您可以通过将分隔添加到原始列中来使其难看。
data.frame <- data.frame %>%
> mutate(time_difference = (as.numeric(as.POSIXct(time1)) - as.numeric(as.POSIXct(time2)))) / 60
希望这会有所帮助!
答案 1 :(得分:0)
我通过考虑问题描述中提到的连续两行之间的时间差,然后删除所有超过15分钟的观察值来解决此问题。然后,我对每个学生进行求和,得出每个学生的总学习时间。