我使用插入符号和逐步回归对一小部分数据样本进行了3种不同模型的数据样本实验。基于prAUC,我可以看到哪种模型效果最好。
我想根据这些逐步模型之一选择的特征,选择要在具有较大样本的模型上使用的特征。
我可以使用以下命令查看最终选择的功能:
> formula(step_both_model$finalModel)
.outcome ~ tenure_months + auto_renewal_flag + v_count_ventures +
v_count_hosting_top_ten_competitor + v_count_hosting_long_tail_competitor +
v_count_domains + v_count_email + v_count_ssl + v_count_no_hosting_detected +
v_change_external_mail_petal_count + product_pnl_line_nameCnP.Hosting +
product_pnl_line_nameGrid + product_pnl_line_namePaid.Support +
product_pnl_line_nameShared.Hosting + product_pnl_line_nameWordpress +
shopper_region_1_nameAPAC + shopper_region_1_nameCanada +
shopper_region_1_nameEMEA + shopper_region_1_nameLatAm +
shopper_region_1_nameOthers + usa_tenure
<environment: 0xb77b818>
我的问题是,不是手动剪切并粘贴此特征列表,而是要提取一个r模型的预测变量特征名称以在另一个模型中使用?
尝试:
model$finalModel$terms
.outcome ~ tenure_months + auto_renewal_flag + v_count_ventures +
v_count_hosting_top_ten_competitor + v_count_hosting_long_tail_competitor +
v_count_domains + v_count_email + v_count_ssl + v_count_no_hosting_detected +
v_change_external_mail_petal_count + product_pnl_line_nameCnP.Hosting +
product_pnl_line_nameGrid + product_pnl_line_namePaid.Support +
product_pnl_line_nameShared.Hosting + product_pnl_line_nameWordpress +
shopper_region_1_nameAPAC + shopper_region_1_nameCanada +
shopper_region_1_nameEMEA + shopper_region_1_nameLatAm +
shopper_region_1_nameOthers + usa_tenure
attr(,"variables")
list(.outcome, tenure_months, auto_renewal_flag, v_count_ventures,
v_count_hosting_top_ten_competitor, v_count_hosting_long_tail_competitor,
v_count_domains, v_count_email, v_count_ssl, v_count_no_hosting_detected,
v_change_external_mail_petal_count, product_pnl_line_nameCnP.Hosting,
product_pnl_line_nameGrid, product_pnl_line_namePaid.Support,
product_pnl_line_nameShared.Hosting, product_pnl_line_nameWordpress,
shopper_region_1_nameAPAC, shopper_region_1_nameCanada, shopper_region_1_nameEMEA,
shopper_region_1_nameLatAm, shopper_region_1_nameOthers,
usa_tenure)
attr(,"factors")
tenure_months auto_renewal_flag v_count_ventures
.outcome 0 0 0
tenure_months 1 0 0
auto_renewal_flag 0 1 0
v_count_ventures 0 0 1
v_count_hosting_top_ten_competitor 0 0 0
v_count_hosting_long_tail_competitor 0 0 0
v_count_domains 0 0 0
v_count_email 0 0 0
v_count_ssl 0 0 0
v_count_no_hosting_detected 0 0 0
v_change_external_mail_petal_count 0 0 0
product_pnl_line_nameCnP.Hosting 0 0 0
product_pnl_line_nameGrid 0 0 0
product_pnl_line_namePaid.Support 0 0 0
product_pnl_line_nameShared.Hosting 0 0 0
product_pnl_line_nameWordpress 0 0 0
shopper_region_1_nameAPAC 0 0 0
shopper_region_1_nameCanada 0 0 0
shopper_region_1_nameEMEA 0 0 0
shopper_region_1_nameLatAm 0 0 0
shopper_region_1_nameOthers 0 0 0
usa_tenure 0 0 0
v_count_hosting_top_ten_competitor v_count_hosting_long_tail_competitor
.outcome 0 0
tenure_months 0 0
auto_renewal_flag 0 0
v_count_ventures 0 0
v_count_hosting_top_ten_competitor 1 0
v_count_hosting_long_tail_competitor 0 1
v_count_domains 0 0
v_count_email 0 0
v_count_ssl 0 0
v_count_no_hosting_detected 0 0
v_change_external_mail_petal_count 0 0
product_pnl_line_nameCnP.Hosting 0 0
product_pnl_line_nameGrid 0 0
product_pnl_line_namePaid.Support 0 0
product_pnl_line_nameShared.Hosting 0 0
product_pnl_line_nameWordpress 0 0
shopper_region_1_nameAPAC 0 0
shopper_region_1_nameCanada 0 0
shopper_region_1_nameEMEA 0 0
shopper_region_1_nameLatAm 0 0
shopper_region_1_nameOthers 0 0
usa_tenure 0 0
其中提供了很多信息,但我看不到如何仅提取用作预测变量的特征名称,以便可以在新模型中使用这些特征名称(具有更大的数据样本)
如何提取模型特征名称,例如根据功能名称过滤数据框,然后传递到train()
答案 0 :(得分:1)
也许使用
update(formula(model$finalModel), newVariable ~ .)
# newVariable ~ crim + zn + chas1 + nox + rm + dis + rad + tax +
# ptratio + b + lstat + `rm:lstat`
# <environment: 0x119e6c6a8>
更快地提供您想要的东西。要获得右侧,您可以使用
formula(model$finalModel)[[3]]
# crim + zn + chas1 + nox + rm + dis + rad + tax + ptratio + b +
# lstat + `rm:lstat`
在提取预测变量作为字符向量的同时,您可以做
attr(terms(formula(model$finalModel)), "term.labels")
# [1] "crim" "zn" "chas1" "nox" "rm" "dis"
# [7] "rad" "tax" "ptratio" "b" "lstat" "`rm:lstat`"