按效果大小过滤部分依赖性

时间:2019-03-28 11:45:08

标签: r dependencies mlr

我拟合了一个模型,并希望看一看(并绘制)部分依赖项。 对于此任务,我使用mlr包。但是,由于我具有80个功能,因此我只想查看对目标变量影响最大的那些功能。有没有一种方法可以只计算或显示具有最大影响力的特征的偏相关性?

这里是一个示例:我只拟合了4个值。假设我只想查看或计算两个最有影响力的功能的部分相关性。

library(mlr)
pd = generatePartialDependenceData(mod, train_task, c("diveyTrue",  "dinnerTrue","BikeParkingTrue", "latenightTrue"))

pd
PartialDependenceData
Task: dat
Features: diveyTrue, dinnerTrue, BikeParkingTrue, latenightTrue
Target: diveyTrue, dinnerTrue, BikeParkingTrue, latenightTrue
Derivative: FALSE
Interaction: FALSE
Individual: FALSE
   review_count diveyTrue dinnerTrue BikeParkingTrue latenightTrue
1:     73.92993 0.0000000         NA              NA            NA
2:     73.68386 0.1111111         NA              NA            NA
3:     73.68386 0.2222222         NA              NA            NA
4:     73.68386 0.3333333         NA              NA            NA
5:     73.68386 0.4444444         NA              NA            NA
6:     63.56335 0.5555556         NA              NA            NA
... (#rows: 40, #cols: 5)

任务是回归,第一列是目标变量。所有其他变量均为虚拟变量。因此,目标变量将保持恒定,直到“ diveyTrue”的值大于0.5。

这里是一个小的dput()

 structure(list(data = structure(list(review_count = c(73.9299260484918, 
73.6838552698629, 73.6838552698629, 73.6838552698629, 73.6838552698629, 
63.5633491608329, 63.5633491608329, 63.5633491608329, 63.5633491608329, 
63.5633491608329, 44.123492893074, 44.0855985404284, 44.0855985404284, 
44.0855985404284, 44.0855985404284, 67.9185575263356, 67.9185575263356, 
67.9185575263356, 67.9185575263356, 67.9185575263356, 64.1248331786005, 
64.1243679505065, 64.1243679505065, 64.1243679505065, 64.1243679505065, 
64.9177431842816, 64.9177431842816, 64.9177431842816, 64.9177431842816, 
64.9177431842816, 58.2709529252224, 58.2709529252224, 58.2709529252224, 
58.2709529252224, 58.2709529252224, 89.8281204749236, 89.8281204749236, 
89.8281204749236, 89.8281204749236, 89.8281204749236), diveyTrue = c(0, 
0.111111111111111, 0.222222222222222, 0.333333333333333, 0.444444444444444, 
0.555555555555556, 0.666666666666667, 0.777777777777778, 0.888888888888889, 
1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    dinnerTrue = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 
    0.111111111111111, 0.222222222222222, 0.333333333333333, 
    0.444444444444444, 0.555555555555556, 0.666666666666667, 
    0.777777777777778, 0.888888888888889, 1, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), BikeParkingTrue = c(NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0.111111111111111, 
    0.222222222222222, 0.333333333333333, 0.444444444444444, 
    0.555555555555556, 0.666666666666667, 0.777777777777778, 
    0.888888888888889, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA), latenightTrue = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, 0, 0.111111111111111, 0.222222222222222, 
    0.333333333333333, 0.444444444444444, 0.555555555555556, 
    0.666666666666667, 0.777777777777778, 0.888888888888889, 
    1)), row.names = c(NA, -40L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000002521ef0>), task.desc = structure(list(
    id = "dat", type = "regr", target = "review_count", size = 9943L, 
    n.feat = c(numerics = 79L, factors = 0L, ordered = 0L, functionals = 0L
    ), has.missings = TRUE, has.weights = FALSE, has.blocking = FALSE, 
    has.coordinates = FALSE), class = c("RegrTaskDesc", "SupervisedTaskDesc", 
"TaskDesc")), target = c("diveyTrue", "dinnerTrue", "BikeParkingTrue", 
"latenightTrue"), features = c("diveyTrue", "dinnerTrue", "BikeParkingTrue", 
"latenightTrue"), derivative = FALSE, interaction = FALSE, individual = FALSE), class = "PartialDependenceData")

0 个答案:

没有答案