感谢其他用户的帮助,我成功地将我的数据集划分为序列并聚合每个序列的响应。序列由刺激(A或B)的出现来定义[在用户发生的那些刺激中的任何一个之前,它是所谓的0序列]。这意味着每个用户可能根据他所感知的刺激量而具有多个序列。每个用户都有事件日志,我根据上面的标准拆分事件日志。我使用了以下代码:
#change the date into posixct format
df$Date <- as.POSIXct(strptime(master$Date,"%d.%m.%Y %H:%M"))
#arrange the dataframe according to User and Date
df <- arrange(df, User,Date)
#create a unique ID for each stimuli combination
df$stims <- with(df, paste(cumsum(StimuliA), cumsum(StimuliB), sep="_"))
#aggregate all the eventlog rows according to the stimuli IDs
df1 <- aggregate(. ~ User + stims, data=df, sum)
来源:Summarize and count data in R with dplyr
数据集:
structure(list(User = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), Date = c("02.12.2015 20:16", "03.12.2015 20:17",
"02.12.2015 20:44", "03.12.2015 09:32", "03.12.2015 09:33", "07.12.2015 08:18",
"08.12.2015 19:40", "08.12.2015 19:43", "22.12.2015 18:22", "22.12.2015 18:23",
"23.12.2015 14:18", "05.01.2016 11:35", "05.01.2016 13:21", "05.01.2016 13:22",
"05.01.2016 13:22", "04.08.2016 08:25"), StimuliA = c(0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), StimuliB = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L),
R2 = c(1L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L), R3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L), R4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R5 = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), R6 = c(0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), R7 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L), User_Seq = c("1_0_0", "1_0_0", "1_0_0",
"1_0_0", "1_0_0", "1_1_0", "1_1_0", "1_1_0", "1_1_0", "1_1_0",
"1_2_0", "1_2_1", "1_2_1", "1_2_1", "1_2_1", "1_2_2")), .Names = c("User",
"Date", "StimuliA", "StimuliB", "R2", "R3", "R4", "R5", "R6",
"R7", "User_Seq"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-16L), spec = structure(list(cols = structure(list(User = structure(list(), class = c("collector_integer",
"collector")), Date = structure(list(), class = c("collector_character",
"collector")), StimuliA = structure(list(), class = c("collector_integer",
"collector")), StimuliB = structure(list(), class = c("collector_integer",
"collector")), R2 = structure(list(), class = c("collector_integer",
"collector")), R3 = structure(list(), class = c("collector_integer",
"collector")), R4 = structure(list(), class = c("collector_integer",
"collector")), R5 = structure(list(), class = c("collector_integer",
"collector")), R6 = structure(list(), class = c("collector_integer",
"collector")), R7 = structure(list(), class = c("collector_integer",
"collector")), User_Seq = structure(list(), class = c("collector_character",
"collector"))), .Names = c("User", "Date", "StimuliA", "StimuliB",
"R2", "R3", "R4", "R5", "R6", "R7", "User_Seq")), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
我的目标是调整此代码以创建相同的序列摘要,但将响应分为两部分。一个是刺激日期后的第一周,然后汇总所有其他“滞后”的反应。
我在下面的示例中说明了这一点。也可以使用一个额外的列以长格式执行此操作,该列用1/0和相同的日期标识滞后响应,但最佳输出将是宽格式。
User Da StimuliA StimuliB Seq_ID R2 R3 R4 R5 R6 R7 R2l R3l R4l R5l R6l R7l
1 02.12.2015 20:16 0 0 1_0_0 4 0 0 0 1 0 0 0 0 0 0 0
1 07.12.2015 08:18 1 0 1_1_0 1 0 0 0 0 1 2 0 0 0 0 0
1 23.12.2015 14:18 1 0 1_2_0 0 0 0 0 0 0 0 0 0 0 0 0
1 05.01.2016 11:35 0 1 1_2_1 0 2 0 0 0 1 0 1 0 0 0 0
1 04.08.2016 08:25 0 1 1_2_2 0 0 0 0 0 0 0 0 0 0 0 0
f.e正如你在这里看到的第9行&amp;样本中的10个聚集在R2l(Resoibse 2滞后),因为它们发生在2015年12月7日08:18之后的一周。
答案 0 :(得分:0)
我找到了解决问题的方法。基本上我按序列id(Seqid)和Date组织它,并将其分组为seqid。然后我在7天后创建一个具有最小日期的新列。之后,只需将这个最早的日期加7天与每个正常日期进行比较,并将第一周的值设为0,将其他值设为1。
df <- df %>%
arrange(seqid, Date) %>%
group_by(seqid) %>%
mutate(Date7 = (min(Date) + 604800)) %>%
mutate(Group = ifelse(Date7>Date,0,1))
之后,只需将其重新整形为宽泛的格式,如问题所示。