I did a multiclass (3) classification using a SVM with a linear kernel.
For this task, I used the mlr
package. The SVM is from the kernlab
package.
library(mlr)
library(kernlab)
print(filtered_task)
Supervised task: dtm
Type: classif
Target: target_lable
Observations: 1462
Features:
numerics factors ordered functionals
291 0 0 0
Missings: FALSE
Has weights: FALSE
Has blocking: FALSE
Has coordinates: FALSE
Classes: 3
negative neutral positive
917 309 236
Positive class: NA
lrn = makeLearner("classif.ksvm", par.vals = list(kernel = "vanilladot"))
mod = mlr::train(lrn, train_task)
Now I want to know which features have the highest weights for each class. Any idea how to get there?
Moreover, it would be nice to get the feature weights for each class for the cross-validation result.
rdesc = makeResampleDesc("CV",
iters = 10,
stratify = T)
set.seed(3)
r = resample(lrn, filtered_task, rdesc)
I know that there is the possibility to calculate the feature importance like below, which is similar to the cross-validation results because of the Monte-Carlo iterations.
imp = generateFeatureImportanceData(task = train_task,
method = "permutation.importance",
learner = lrn,
nmc = 10)
However, for this method I can´t get the feature importance for each class but only the importance overall.
library(dplyr)
library(ggplot)
imp_data = melt(imp$res[, 2:ncol(imp$res)])
imp_data = imp_data %>%
arrange(-value)
imp_data[1:10,] %>%
ggplot(aes(x = reorder(variable, value), y = value)) +
geom_bar(stat = "identity", fill = "darkred") +
labs(x = "Features", y = "Permutation Importance") +
coord_flip() +
theme_minimal()