我创建了一个数据集,如下所示,用于应用市场购物篮分析(apriori())
id name
1 mango
1 apple
1 grapes
2 apple
2 carrot
3 mango
3 apple
4 apple
4 carrot
4 grapes
5 strawberry
6 guava
6 strawberry
6 bananas
7 bananas
8 guava
8 strawberry
8 pineapple
9 mango
9 apple
9 blueberries
10 black grapes
11 pomogranate
12 black grapes
12 pomogranate
12 carrot
12 custard apple
我应用了一些逻辑将其转换为市场购物篮分析流程数据。
library(arules)
fact <- data.frame(lapply(frt,as.factor))
trans <- as(fact, 'transactions')
我也试过这个并且出错了。
trans1 = read.transactions(file = frt, format = "single", sep = ",",cols=c("id","name"))
Error in scan(file = file, what = "", sep = sep, quiet = TRUE, nlines = 1) :
'file' must be a character string or connection
我得到的输出并不像预期的那样。 输出我得到了。
items transactionID
1 {name=mango} 1
2 {name=apple} 2
3 {name=grapes} 3
4 {name=apple} 4
5 {name=carrot} 5
6 {name=mango} 6
7 {name=apple} 7
8 {name=apple} 8
9 {name=carrot} 9
10 {name=grapes} 10
11 {name=strawberry} 11
12 {name=guava} 12
13 {name=strawberry} 13
14 {name=bananas} 14
我的预期输出是
id item
1 {mango,apple,grapes)
2 {apple,carrot}
3 {mango,apple}
依此类推
所以任何人都可以帮助我获得预期的输出(如果可能的话)
这样可以帮助我应用apriori()算法。
提前感谢你。
答案 0 :(得分:1)
如果您在arules
进行市场购物篮分析,则需要构建transactions
。您可以通过以下文本文件执行此操作:
write.csv(frt,file="temp.csv", row.names=FALSE) # say "temp.csv" is your text file
tranx <- read.transactions(file="temp.csv",format="single", sep=",", cols=c("id","name"))
inspect(tranx)
# items transactionID
# 1 {apple,
# grapes,
# mango} 1
# 2 {black-grapes} 10
# 3 {pomogranate} 11
# 4 {black-grapes,
# carrot,
# custard-apple,
# pomogranate} 12
... 或,如果您已将文本文件读入data.frame
,则可以通过列表对象将其强制转换为transactions
,如:
tranx2 <- list()
for(i in unique(frt$id)){
tranx2[[i]] <- unlist(frt$name[frt$id==i])
}
inspect(as(tranx2,'transactions'))
# items
# 1 {apple,
# grapes,
# mango}
# 2 {apple,
# carrot}
# 3 {apple,
# mango}
# 4 {apple,
# carrot,
# grapes}