我拟合了一个模型,并希望看一看(并绘制)部分依赖项。
对于此任务,我使用mlr
包。但是,由于我具有80个功能,因此我只想查看对目标变量影响最大的那些功能。有没有一种方法可以只计算或显示具有最大影响力的特征的偏相关性?
这里是一个示例:我只拟合了4个值。假设我只想查看或计算两个最有影响力的功能的部分相关性。
library(mlr)
pd = generatePartialDependenceData(mod, train_task, c("diveyTrue", "dinnerTrue","BikeParkingTrue", "latenightTrue"))
pd
PartialDependenceData
Task: dat
Features: diveyTrue, dinnerTrue, BikeParkingTrue, latenightTrue
Target: diveyTrue, dinnerTrue, BikeParkingTrue, latenightTrue
Derivative: FALSE
Interaction: FALSE
Individual: FALSE
review_count diveyTrue dinnerTrue BikeParkingTrue latenightTrue
1: 73.92993 0.0000000 NA NA NA
2: 73.68386 0.1111111 NA NA NA
3: 73.68386 0.2222222 NA NA NA
4: 73.68386 0.3333333 NA NA NA
5: 73.68386 0.4444444 NA NA NA
6: 63.56335 0.5555556 NA NA NA
... (#rows: 40, #cols: 5)
任务是回归,第一列是目标变量。所有其他变量均为虚拟变量。因此,目标变量将保持恒定,直到“ diveyTrue”的值大于0.5。
这里是一个小的dput()
:
structure(list(data = structure(list(review_count = c(73.9299260484918,
73.6838552698629, 73.6838552698629, 73.6838552698629, 73.6838552698629,
63.5633491608329, 63.5633491608329, 63.5633491608329, 63.5633491608329,
63.5633491608329, 44.123492893074, 44.0855985404284, 44.0855985404284,
44.0855985404284, 44.0855985404284, 67.9185575263356, 67.9185575263356,
67.9185575263356, 67.9185575263356, 67.9185575263356, 64.1248331786005,
64.1243679505065, 64.1243679505065, 64.1243679505065, 64.1243679505065,
64.9177431842816, 64.9177431842816, 64.9177431842816, 64.9177431842816,
64.9177431842816, 58.2709529252224, 58.2709529252224, 58.2709529252224,
58.2709529252224, 58.2709529252224, 89.8281204749236, 89.8281204749236,
89.8281204749236, 89.8281204749236, 89.8281204749236), diveyTrue = c(0,
0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444,
0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889,
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
dinnerTrue = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0,
0.111111111111111, 0.222222222222222, 0.333333333333333,
0.444444444444444, 0.555555555555556, 0.666666666666667,
0.777777777777778, 0.888888888888889, 1, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), BikeParkingTrue = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0.111111111111111,
0.222222222222222, 0.333333333333333, 0.444444444444444,
0.555555555555556, 0.666666666666667, 0.777777777777778,
0.888888888888889, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), latenightTrue = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 0, 0.111111111111111, 0.222222222222222,
0.333333333333333, 0.444444444444444, 0.555555555555556,
0.666666666666667, 0.777777777777778, 0.888888888888889,
1)), row.names = c(NA, -40L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000002521ef0>), task.desc = structure(list(
id = "dat", type = "regr", target = "review_count", size = 9943L,
n.feat = c(numerics = 79L, factors = 0L, ordered = 0L, functionals = 0L
), has.missings = TRUE, has.weights = FALSE, has.blocking = FALSE,
has.coordinates = FALSE), class = c("RegrTaskDesc", "SupervisedTaskDesc",
"TaskDesc")), target = c("diveyTrue", "dinnerTrue", "BikeParkingTrue",
"latenightTrue"), features = c("diveyTrue", "dinnerTrue", "BikeParkingTrue",
"latenightTrue"), derivative = FALSE, interaction = FALSE, individual = FALSE), class = "PartialDependenceData")