我想根据新交易清单计算(先前定义的/旧的)项目集的提升量。这可以使用interestMeasure函数完成。
quality(old_itemsets)$lift_ref <- interestMeasure(old_itemsets,"lift",transactions = TransMat_ref, reuse = FALSE)
问题是:这不能正常工作。我知道这一点,因为我有一些项目集只包含一个项目。在计算新交易中的升力时,对于这些单项,升力应该等于1但不是!
我认为问题可能出在我的预处理中。我用于生成项集和新trensactions的事务不包含完全相同的项。因此,我将一个列表中缺少的项目添加到另一个列表中,反之亦然。以下是一个如何在一个方向上完成的例子。
OldNames <- colnames(TransMat_old)
ReferenceNames <- colnames(TransMat_ref)
SetDiffNames <- setdiff(ReferenceNames, OldNames)
ItemsToAdd <- matrix(data = FALSE, nrow = length(TransMat_old), ncol = length(SetDiffNames))
colnames(ItemsToAdd) <- SetDiffNames
TransMat_old <- merge(TransMat_old, ItemsToAdd)
正如我上面所写,我这样做了两次,因此两个事务矩阵都包含所有项目。问题是:缺少的项目只是作为附加列添加,这意味着它们对于两个矩阵的顺序不同!
这可能是我顶部的interestMeasure
不起作用的原因吗?
提前致谢!
library(arules)
#create transactions
data <- paste(
"item1, item2, item3",
"item1, item3",
"item1, item2",
sep="\n")
cat(data)
write(data, file = "TransMat_Old")
data <- paste(
"item2, item3, item4",
"item3, item4",
"item2, item4",
"item2",
sep="\n")
cat(data)
write(data, file = "TransMat_New")
# load transactions
TransMat_Old <- read.transactions("TransMat_Old", format = "basket", sep=",")
TransMat_New <- read.transactions("TransMat_New", format = "basket", sep=",")
# Here's my function for adding
SameItems <- function(TransMat_Old, TransMat_New){
OldNames <- colnames(TransMat_Old)
NewNames <- colnames(TransMat_New)
SetDiffNames <- setdiff(NewNames, OldNames)
ItemsToAdd <- matrix(data = FALSE, nrow = length(TransMat_Old), ncol = length(SetDiffNames))
colnames(ItemsToAdd) <- SetDiffNames
TransMat_Data_allItems <- merge(TransMat_Old, ItemsToAdd)
return(TransMat_Data_allItems)
}
# Add items from one matrix to the other and vice versa
Combined1 <- SameItems(TransMat_Old, TransMat_New)
Combined2 <- SameItems(TransMat_New, TransMat_Old)
# Find itemsets in the old matrix
itemsets <- apriori(data=Combined1, parameter=list(supp=0.1, maxlen=2, target="frequent itemsets"))
inspect(itemsets)
#Calculate Lift for the itemsets
quality(itemsets)$lift_oldSet <- interestMeasure(itemsets,"lift", transactions = Combined1, reuse = FALSE)
#Calculate lift for old itemsets based on the new transaction matrix
quality(itemsets)$lift_newSet <- interestMeasure(itemsets,"lift", transactions = Combined2, reuse = FALSE)
#Single-item-itemsets should have a lift of 1. But they have not.
inspect(itemsets)
如上所述:单项项目集在新数据集中应该提升1。但他们没有。
答案 0 :(得分:1)
只需获取所有商品标签并重新编码交易集。
all_item_labels <- union(itemLabels(TransMat_New),itemLabels(TransMat_Old))
TransMat_Old <- recode(TransMat_Old, itemLabels = all_item_labels)
TransMat_New <- recode(TransMat_New, itemLabels = all_item_labels)
现在两个交易集在相同的顺序中具有相同的项目并且彼此兼容。