我正在为一个重要的大学项目进行这种关联分析,但我不确定结果如何。 我想找到相关品牌之间的关联,而不是相关项目。 我很期待收到关于我的代码的反馈,因为对此我不确定。 我使用以下代码:
TechStore <- read_excel("C:/Desktop/sample data/TechSalesData.xlsx")
#ddply(dataframe, variables OrderNumber and Date to get Transaction format)
techtransactions <- ddply(TechStore,c("OrderNumber","OrderDate"),
function(df1)paste(df1$Brand,
collapse = ","))
techtransactions$OrderNumber <- NULL
#set column Date of dataframe transactionData
techtransactions$OrderDate <- NULL
#Rename column to items
colnames(techtransactions) <- c("items")
write.csv(techtransactions,"C:/Desktop/sample data/TechTransactions.csv", quote = FALSE, row.names = FALSE)
TechTrans <- read.transactions("C:/Desktop/sample data/TechTransactions.csv", format = 'basket', sep=',')
rules <- apriori(TechTrans, parameter = list(support = 0.001, confidence = 0.2, minlen=2), control = list(verbose = FALSE))
summary(rules)
inspect(sort(rules, by = "lift")[1:5])
这是结果:
> inspect(sort(rules, by = "lift")[1:5])
lhs rhs support confidence lift count
[1] {Dell,Lenovo,Toshiba} => {Case Logic} 0.001699854 0.5833333 7.391282 7
[2] {Adventure Bags,Case Logic,HP} => {iPhone} 0.001214182 0.6250000 7.129501 5
[3] {Acer,Case Logic,Lenovo} => {Toshiba} 0.001214182 0.5555556 6.426342 5
[4] {Acer,Lenovo,Toshiba} => {Case Logic} 0.001214182 0.5000000 6.335385 5
[5] {Huawei,Lenovo,Targus} => {Apple} 0.001214182 0.5000000 6.183183 5
(这是一个使用品牌而非产品的示例数据集)
这是正确的方法吗?我对协会和规则没有经验。结果合理吗?
非常感谢您!
最佳卢卡人