我有一个csv文件,其中包含以下格式的数据:
2014-01-05 23:05:42 Nicole 2014-01-05 22:41:26
2014-01-06 13:02:58 Albert 2014-01-06 11:58:14
2014-01-08 03:04:49 Nicole 2014-01-08 02:49:58
2014-01-08 03:04:49 Nicole 2014-01-08 02:49:58
2014-01-08 08:26:41 Marlen 2014-01-08 05:45:08
第一个日期是更新日期,第二个日期是创建时间。 基本上,我可以计算每个人在日期和媒体之间经过的时间吗?
答案 0 :(得分:3)
这是使用data.table
的可能实现。在这里,我们首先转换为POSIXct
类,然后计算每个名称的平均差异(以分钟为单位)。如果您愿意,可以添加round
。
library(data.table)
setDT(df)[, `:=`(V1 = as.POSIXct(V1), V3 = as.POSIXct(V3))]
df[, mean(difftime(V1, V3, units = "mins")), by = V2]
# V2 V1
# 1: Nicole 17.98889 mins
# 2: Albert 64.73333 mins
# 3: Marlen 161.55000 mins
数据强>
df <- structure(list(V1 = structure(c(1L, 2L, 3L, 3L, 4L), .Label = c("2014-01-05 23:05:42",
"2014-01-06 13:02:58", "2014-01-08 03:04:49", "2014-01-08 08:26:41"
), class = "factor"), V2 = structure(c(3L, 1L, 3L, 3L, 2L), .Label = c("Albert",
"Marlen", "Nicole"), class = "factor"), V3 = structure(c(1L,
2L, 3L, 3L, 4L), .Label = c("2014-01-05 22:41:26", "2014-01-06 11:58:14",
"2014-01-08 02:49:58", "2014-01-08 05:45:08"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -5L))
有关data.table
答案 1 :(得分:1)
使用dplyr
的类似选项(来自@DavidArenburg&#39;的数据)。我们按照&#39; V2&#39;进行分组,转换列&#39; V1&#39;,&#39; V3&#39;使用POSIXct
到mutate_each
课程,以及summarise
来获得&#39; V1和&#39; V3&#39;之间mean
的时间差异。
library(dplyr)
df %>%
group_by(V2) %>%
mutate_each(funs(as.POSIXct(.)), V1, V3) %>%
summarise(DiffMean = mean(difftime(V1, V3, units="mins")))
# V2 DiffMean
# (fctr) (dfft)
#1 Albert 64.73333 mins
#2 Marlen 161.55000 mins
#3 Nicole 17.98889 mins