将一系列事件日志拆分并聚合为间隔

时间:2017-08-04 08:54:44

标签: r date aggregate

感谢其他用户的帮助,我成功地将我的数据集划分为序列并聚合每个序列的响应。序列由刺激(A或B)的出现来定义[在用户发生的那些刺激中的任何一个之前,它是所谓的0序列]。这意味着每个用户可能根据他所感知的刺激量而具有多个序列。每个用户都有事件日志,我根据上面的标准拆分事件日志。我使用了以下代码:

#change the date into posixct format
df$Date <- as.POSIXct(strptime(master$Date,"%d.%m.%Y %H:%M"))

#arrange the dataframe according to User and Date
df <-  arrange(df, User,Date)

#create a unique ID for each stimuli combination
df$stims <- with(df, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))

#aggregate all the eventlog rows according to the stimuli IDs
df1 <- aggregate(. ~ User + stims, data=df, sum)

来源:Summarize and count data in R with dplyr

数据集:

    structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), Date = c("02.12.2015 20:16", "03.12.2015 20:17", 
"02.12.2015 20:44", "03.12.2015 09:32", "03.12.2015 09:33", "07.12.2015 08:18", 
"08.12.2015 19:40", "08.12.2015 19:43", "22.12.2015 18:22", "22.12.2015 18:23", 
"23.12.2015 14:18", "05.01.2016 11:35", "05.01.2016 13:21", "05.01.2016 13:22", 
"05.01.2016 13:22", "04.08.2016 08:25"), StimuliA = c(0L, 0L, 
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), StimuliB = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L), 
    R2 = c(1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 
    0L, 0L, 0L), R3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 1L, 0L, 1L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R5 = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R6 = c(0L, 
    0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), R7 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 0L), User_Seq = c("1_0_0", "1_0_0", "1_0_0", 
    "1_0_0", "1_0_0", "1_1_0", "1_1_0", "1_1_0", "1_1_0", "1_1_0", 
    "1_2_0", "1_2_1", "1_2_1", "1_2_1", "1_2_1", "1_2_2")), .Names = c("User", 
"Date", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6", 
"R7", "User_Seq"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-16L), spec = structure(list(cols = structure(list(User = structure(list(), class = c("collector_integer", 
"collector")), Date = structure(list(), class = c("collector_character", 
"collector")), StimuliA = structure(list(), class = c("collector_integer", 
"collector")), StimuliB = structure(list(), class = c("collector_integer", 
"collector")), R2 = structure(list(), class = c("collector_integer", 
"collector")), R3 = structure(list(), class = c("collector_integer", 
"collector")), R4 = structure(list(), class = c("collector_integer", 
"collector")), R5 = structure(list(), class = c("collector_integer", 
"collector")), R6 = structure(list(), class = c("collector_integer", 
"collector")), R7 = structure(list(), class = c("collector_integer", 
"collector")), User_Seq = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB", 
"R2", "R3", "R4", "R5", "R6", "R7", "User_Seq")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

我的目标是调整此代码以创建相同的序列摘要,但将响应分为两部分。一个是刺激日期后的第一周,然后汇总所有其他“滞后”的反应。

我在下面的示例中说明了这一点。也可以使用一个额外的列以长格式执行此操作,该列用1/0和相同的日期标识滞后响应,但最佳输出将是宽格式。

User  Da           StimuliA StimuliB Seq_ID R2  R3  R4  R5  R6  R7  R2l R3l R4l R5l R6l R7l 
 1  02.12.2015 20:16    0        0   1_0_0     4    0   0   0   1   0   0   0   0   0   0   0
 1  07.12.2015 08:18    1        0   1_1_0    1 0   0   0   0   1   2   0   0   0   0   0
 1  23.12.2015 14:18    1        0   1_2_0    0 0   0   0   0   0   0   0   0   0   0   0
 1  05.01.2016 11:35    0        1   1_2_1    0 2   0   0   0   1   0   1   0   0   0   0
 1  04.08.2016 08:25    0        1   1_2_2    0 0   0   0   0   0   0   0   0   0   0   0

f.e正如你在这里看到的第9行&amp;样本中的10个聚集在R2l(Resoibse 2滞后),因为它们发生在2015年12月7日08:18之后的一周。

1 个答案:

答案 0 :(得分:0)

我找到了解决问题的方法。基本上我按序列id(Seqid)和Date组织它,并将其分组为seqid。然后我在7天后创建一个具有最小日期的新列。之后,只需将这个最早的日期加7天与每个正常日期进行比较,并将第一周的值设为0,将其他值设为1。

df <- df %>%
        arrange(seqid, Date) %>% 
        group_by(seqid) %>%
        mutate(Date7 = (min(Date) + 604800)) %>%
        mutate(Group = ifelse(Date7>Date,0,1))

之后,只需将其重新整形为宽泛的格式,如问题所示。