如何以“交易”的形式准备数据,但对于每个交易ID,时间效应/顺序是否被考虑在内?我发现使用“拆分”功能,它们将按字母顺序排序。
例如:
ID Items Sequence
1 D 1
1 A 2
1 C 3
2 A 1
2 B 2
交易中所需的输出:
ID Items
1 D A C #notice that A comes after D as it is dictacted by sequence variable
# here for the order
2 A B
问候。
答案 0 :(得分:0)
使用lapply和rbind,
DF = read.table(text="ID Items Sequence
1 D 1
1 A 2
1 C 3
2 A 1
2 B 2",header=TRUE,stringsAsFactors=FALSE,na.strings="")
DF
# ID Items Sequence
#1 1 D 1
#2 1 A 2
#3 1 C 3
#4 2 A 1
#5 2 B 2
对于每个ID,数据框的子集,按顺序排序,组合项目并返回每个ID的输出
DF_new = do.call(rbind,lapply(unique(DF$ID),function(x) {
subset_DF = DF[DF$ID==x,];
subset_DF = subset_DF[,order(subset_DF$Sequence)]
subset_DF = subset_DF[,c("ID","Items")]
subset_DF$Items = paste0(subset_DF$Items,collapse=" ")
subset_DF = unique(subset_DF)
rownames(subset_DF)= NULL
return(subset_DF)
}))
DF_new
# ID Items
#1 1 D A C
#2 2 A B