arules包 - 准备数据

时间:2016-04-27 06:31:24

标签: r math machine-learning

如何以“交易”的形式准备数据,但对于每个交易ID,时间效应/顺序是否被考虑在内?我发现使用“拆分”功能,它们将按字母顺序排序。

例如:

ID Items Sequence
1  D     1
1  A     2
1  C     3
2  A     1 
2  B     2

交易中所需的输出:

ID Items
1  D A C #notice that A comes after D as it is dictacted by sequence variable 
        #                                                        here for the order
2  A B

问候。

1 个答案:

答案 0 :(得分:0)

使用lapply和rbind,

DF = read.table(text="ID Items Sequence
1  D     1
1  A     2
1  C     3
2  A     1 
2  B     2",header=TRUE,stringsAsFactors=FALSE,na.strings="")


DF
#  ID Items Sequence
#1  1     D        1
#2  1     A        2
#3  1     C        3
#4  2     A        1
#5  2     B        2

对于每个ID,数据框的子集,按顺序排序,组合项目并返回每个ID的输出

DF_new = do.call(rbind,lapply(unique(DF$ID),function(x) {

subset_DF = DF[DF$ID==x,];
subset_DF = subset_DF[,order(subset_DF$Sequence)]
subset_DF = subset_DF[,c("ID","Items")]
subset_DF$Items = paste0(subset_DF$Items,collapse=" ")
subset_DF = unique(subset_DF)
rownames(subset_DF)= NULL
return(subset_DF)
}))

DF_new
#  ID Items
#1  1 D A C
#2  2   A B