数据处理

时间:2015-10-25 14:48:10

标签: r

我有一个csv文件,其中包含以下格式的数据:

2014-01-05 23:05:42 Nicole  2014-01-05 22:41:26     
2014-01-06 13:02:58 Albert  2014-01-06 11:58:14
2014-01-08 03:04:49 Nicole  2014-01-08 02:49:58
2014-01-08 03:04:49 Nicole  2014-01-08 02:49:58
2014-01-08 08:26:41 Marlen  2014-01-08 05:45:08

第一个日期是更新日期,第二个日期是创建时间。 基本上,我可以计算每个人在日期和媒体之间经过的时间吗?

2 个答案:

答案 0 :(得分:3)

这是使用data.table的可能实现。在这里,我们首先转换为POSIXct类,然后计算每个名称的平均差异(以分钟为单位)。如果您愿意,可以添加round

library(data.table) 
setDT(df)[, `:=`(V1 = as.POSIXct(V1), V3 = as.POSIXct(V3))]
df[, mean(difftime(V1, V3, units = "mins")), by = V2]
#        V2             V1
# 1: Nicole  17.98889 mins
# 2: Albert  64.73333 mins
# 3: Marlen 161.55000 mins

数据

df <- structure(list(V1 = structure(c(1L, 2L, 3L, 3L, 4L), .Label = c("2014-01-05 23:05:42", 
"2014-01-06 13:02:58", "2014-01-08 03:04:49", "2014-01-08 08:26:41"
), class = "factor"), V2 = structure(c(3L, 1L, 3L, 3L, 2L), .Label = c("Albert", 
"Marlen", "Nicole"), class = "factor"), V3 = structure(c(1L, 
2L, 3L, 3L, 4L), .Label = c("2014-01-05 22:41:26", "2014-01-06 11:58:14", 
"2014-01-08 02:49:58", "2014-01-08 05:45:08"), class = "factor")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -5L))

有关data.table

的更多信息,请参阅here

答案 1 :(得分:1)

使用dplyr的类似选项(来自@DavidArenburg&#39;的数据)。我们按照&#39; V2&#39;进行分组,转换列&#39; V1&#39;,&#39; V3&#39;使用POSIXctmutate_each课程,以及summarise来获得&#39; V1和&#39; V3&#39;之间mean的时间差异。

library(dplyr)
df %>% 
  group_by(V2) %>% 
  mutate_each(funs(as.POSIXct(.)), V1, V3) %>% 
  summarise(DiffMean = mean(difftime(V1, V3, units="mins")))

#      V2       DiffMean
#  (fctr)         (dfft)
#1 Albert  64.73333 mins
#2 Marlen 161.55000 mins
#3 Nicole  17.98889 mins