将正常数据集转换为市场购物篮分析可处理格式

时间:2015-07-07 12:37:23

标签: r arules

我创建了一个数据集,如下所示,用于应用市场购物篮分析(apriori())

id  name
1   mango
1   apple
1   grapes
2   apple
2   carrot
3   mango
3   apple
4   apple
4   carrot
4   grapes
5   strawberry
6   guava
6   strawberry
6   bananas
7   bananas
8   guava
8   strawberry
8   pineapple
9   mango
9   apple
9   blueberries
10  black grapes
11  pomogranate
12  black grapes
12  pomogranate
12  carrot
12  custard apple

我应用了一些逻辑将其转换为市场购物篮分析流程数据。

library(arules)
fact <- data.frame(lapply(frt,as.factor))
trans <- as(fact, 'transactions') 

我也试过这个并且出错了。

trans1 = read.transactions(file = frt, format = "single", sep = ",",cols=c("id","name"))

Error in scan(file = file, what = "", sep = sep, quiet = TRUE, nlines = 1) : 
  'file' must be a character string or connection

我得到的输出并不像预期的那样。 输出我得到了。

items                transactionID
1   {name=mango}                   1  
2   {name=apple}                   2  
3   {name=grapes}                  3  
4   {name=apple}                   4  
5   {name=carrot}                  5  
6   {name=mango}                   6  
7   {name=apple}                   7  
8   {name=apple}                   8  
9   {name=carrot}                  9  
10  {name=grapes}                  10 
11  {name=strawberry}              11 
12  {name=guava}                   12 
13  {name=strawberry}              13 
14  {name=bananas}                 14 

我的预期输出是

id  item
1  {mango,apple,grapes)
2  {apple,carrot}
3  {mango,apple}

依此类推

所以任何人都可以帮助我获得预期的输出(如果可能的话)

  

这样可以帮助我应用apriori()算法。

提前感谢你。

1 个答案:

答案 0 :(得分:1)

如果您在arules进行市场购物篮分析,则需要构建transactions。您可以通过以下文本文件执行此操作:

write.csv(frt,file="temp.csv", row.names=FALSE) # say "temp.csv" is your text file
tranx <- read.transactions(file="temp.csv",format="single", sep=",", cols=c("id","name"))
inspect(tranx)
#     items           transactionID
# 1  {apple,                      
#     grapes,                     
#     mango}                    1 
# 2  {black-grapes}             10
# 3  {pomogranate}              11
# 4  {black-grapes,               
#     carrot,                     
#     custard-apple,              
#     pomogranate}              12

... ,如果您已将文本文件读入data.frame,则可以通过列表对象将其强制转换为transactions,如:

tranx2 <- list()
for(i in unique(frt$id)){
  tranx2[[i]] <- unlist(frt$name[frt$id==i])
}

inspect(as(tranx2,'transactions'))

#   items          
# 1  {apple,        
#   grapes,       
#   mango}        
# 2  {apple,        
#   carrot}       
# 3  {apple,        
#   mango}        
# 4  {apple,        
#   carrot,       
#   grapes}