Find total number of combinations of values available in a vector in R

时间:2018-12-03 13:16:41

标签: r machine-learning analytics

All the input variables names (11 elements) are given here in this vector called x. "qua" is the name of the output variable.

    x <- c("fa", "va", "ca", 
           "rs", "chl", "fsd",
           "tsd", "den",   "pH", 
           "sul", "alc")

I am trying to run a classification model with all possible combinations of the input variables and return the AIC, but I could do it taking one input variable at a time as shown in the code below:

           var_aic <- data.frame(matrix(NA, ncol = 2, byrow = FALSE))
           colnames(var_aic) <- c("Variable", "AIC") 
           # var_aic variable defined null to store values later.

           # Now trying to store AIC of of all the models possible with its               
           # variables name taken into the account.

           for(i in 1:11){
             x <- as.formula(paste("qua ~ ", x[i]))
             model <- polr(x,  train, Hess = TRUE)
             temp <- data.frame(z[i],AIC(model))
             colnames(temp) <- c("Variable", "AIC")
             var_aic <<- rbind(var_aic, temp)
           }

Now I want to build a function which will give me result like

         **Variable                AIC**
           fa                     1460.9
           va                     1399.4
           ca                     1678
           rs                     1460.9
           chl                    1399.4
           fsd                    1678
           tsd                    1460.9
           den                    1399.4
           pH                     1678
           sul                    1460.9
           alc                    1399.4
           fa + va                1233
           fa + ca                1800

           # Also i dont want fa + fa,..... repetitions of the same variable.

I am having a problem in doing this part. So what should I change or add so that it works?

1 个答案:

答案 0 :(得分:0)

combi <- lapply(1:length(x), 
  function(y) apply(combn(x, y), 2, paste, collapse=" + ")
)    

combi.v <- unlist(combi)

length(combi.v) == sum(choose(length(x), 1:length(x)))
# TRUE

tail(combi.v)
# [1] "fa + va + ca + rs + fsd + tsd + den + pH + sul + alc"      
# [2] "fa + va + ca + chl + fsd + tsd + den + pH + sul + alc"     
# [3] "fa + va + rs + chl + fsd + tsd + den + pH + sul + alc"     
# [4] "fa + ca + rs + chl + fsd + tsd + den + pH + sul + alc"     
# [5] "va + ca + rs + chl + fsd + tsd + den + pH + sul + alc"     
# [6] "fa + va + ca + rs + chl + fsd + tsd + den + pH + sul + alc"