如何提取最长的apriori规则(关联规则)

时间:2016-05-29 18:14:14

标签: r labels apriori

使用以下示例时:

library("arules")
data("Adult")
## Mine association rules.
rules <- apriori(Adult,parameter = list(supp = 0.5, conf = 0.9, target = "rules"))
> labels(rules)

您将看到以下规则:

[5] "{sex=Male} => {capital-gain=None}"  
[20] "{race=White,sex=Male} => {capital-gain=None}"
[22] "{sex=Male,native-country=United-States} => {capital-gain=None}" 

具有相同的RHS但在LHS中有所不同。 我想只获得最长的LHS规则并省略短的规则。 在上面提到的例子中,我想省略规则[5],因为它包含在[20]和[22]中。 ({sex = Male}包含在[20]和[22]中)。我想仅使用最长的规则(在其他示例中,最长的规则可以包含3个或更多组件)。

1 个答案:

答案 0 :(得分:1)

使用is.subset获取逻辑矩阵,并使用该矩阵定位非子集:

subsets <- is.subset(rules, proper = TRUE)
subsets[lower.tri(subsets, diag=TRUE)] <- 0 # set lower triangle to 0
notsubsets <- rowSums(subsets) == 0L
labels(rules[notsubsets])


# [1] "{capital-gain=None,hours-per-week=Full-time} => {capital-loss=None}"                      
# [2] "{capital-loss=None,hours-per-week=Full-time} => {capital-gain=None}"                      
# [3] "{race=White,sex=Male} => {capital-gain=None}"                                             
# [4] "{race=White,sex=Male,native-country=United-States} => {capital-loss=None}"                
# [5] "{race=White,sex=Male,capital-loss=None} => {native-country=United-States}"                
# [6] "{sex=Male,capital-loss=None,native-country=United-States} => {race=White}"                
# [7] "{sex=Male,capital-gain=None,native-country=United-States} => {capital-loss=None}"         
# [8] "{workclass=Private,race=White,native-country=United-States} => {capital-loss=None}"       
# [9] "{workclass=Private,race=White,capital-loss=None} => {native-country=United-States}"       
#[10] "{workclass=Private,race=White,capital-gain=None} => {capital-loss=None}"                  
#[11] "{workclass=Private,race=White,capital-loss=None} => {capital-gain=None}"                  
#[12] "{workclass=Private,capital-gain=None,native-country=United-States} => {capital-loss=None}"
#[13] "{workclass=Private,capital-loss=None,native-country=United-States} => {capital-gain=None}"
#[14] "{race=White,capital-gain=None,native-country=United-States} => {capital-loss=None}"       
#[15] "{race=White,capital-loss=None,native-country=United-States} => {capital-gain=None}"       
#[16] "{race=White,capital-gain=None,capital-loss=None} => {native-country=United-States}"

is.subset在评估是否重复时会计算右侧,这是此方法的问题。正如评论中所提到的,上述方法错过了规则{sex=Male,native-country=United-States} => {capital-gain=None}

labels(rules[c(22, 43)])
#[1] "{sex=Male,native-country=United-States} => {capital-gain=None}"                  
#[2] "{sex=Male,capital-gain=None,native-country=United-States} => {capital-loss=None}"
is.subset(rules[22], rules[43])

要获得这些案例,您可以使用<= 1L代替== 0L,但是您也会得到误报("{sex=Male,capital-gain=None} => {capital-loss=None}"{sex=Male,capital-gain=None,native-country=United-States} => {capital-loss=None}的子集