我有一个数据框,其中包含每个会话(列“会话”)一系列操作(列“操作”)。可以在同一会话中重复操作(例如,对于会话01, a - > b>> a ),因为我感兴趣的是理解其中的顺序它们发生了:
x<- data.frame(
session=c("01","01","01","02","02", "02","03","03"),
action=c("a","b","a","c","a","c", "a","b"))
我需要将其转换为事务格式,以便我可以使用'arules'包来应用apriori算法。期望的输出将是:
01 a,b,a
02 c,a,c
03 a,b
其中基本上每个会话都会报告相应的确切序列。
您建议使用哪种方法?
谢谢。
答案 0 :(得分:1)
使用base R
,我们可以使用aggregate
aggregate(action~ session, x, FUN = toString)
# session action
#1 01 a, b, a
#2 02 c, a, c
#3 03 a, b
如果我们需要转换为transactions
library(apriori)
as(split(x$action, x$session), "transactions")
答案 1 :(得分:0)
x <- data.frame(session=c("01","01","01","02","02", "02","03","03"),
action=c("a","b","a","c","a","c", "a","b"))
library(dplyr)
x %>%
group_by(session) %>%
summarise(action = paste0(action, collapse = ","))
# # A tibble: 3 x 2
# session action
# <fct> <chr>
# 1 01 a,b,a
# 2 02 c,a,c
# 3 03 a,b