我试图在一些蛋白质组学数据上使用随机森林分类器。我使用的当前数据集超过1000个功能,我决定使用Boruta进行功能选择。 Boruta选择了31个功能,我打算只使用这些选定的功能来训练我的分类器并预测我的测试集。有没有办法可以创建一个只包含Boruta所选特征的新数据集?
这是我的代码片段:
set.seed(223)
boruta.train <- Boruta(
Class ~ ., data = training, doTrace = 2)
print(getSelectedAttributes(boruta.train))
[1] "P22087"
[2] "P54886.P54886.2"
[3] "Q4V328.Q4V328.4.B1B0M1.Q4V328.2.Q4V328.3"
[4] "Q9BZJ0.Q5JY65.Q9BZJ0.3.Q9BZJ0.4.Q9BZJ0.5"
[5] "Q9NXF1.Q9NXF1.2.B7Z9D5"
[6] "Q6P158.H7C109.Q6P158.2.F8WAZ3.Q6P158.3.C9J207"
[7] "Q9NVV4.Q5T851"
[8] "O00330.E9PB14.E9PBP7.H0YD97.E9PLU0.E9PRI6"
[9] "Q99575.E5RK39"
[10] "Q96ED9.Q96ED9.2"
[11] "Q02790.F5H1U3.H0YFG2.F5H120"
[12] "Q9P258"
[13] "O00267.O00267.2"
[14] "O14497.O14497.2.O14497.3.H0Y488.Q96SM7.H0YCU6.H0YEW5"
[15] "Q9Y265.Q9Y265.2.E7ETR0.H7C4G5.H7C4I3.J3QLR1"
[16] "Q6PKG0.Q6PKG0.3.E5RH50.H0YC33.H0YC73.H0YBM7.H0YAN4.E5RHK4.H0YBJ5.H0YBR8"
[17] "P26641.B4DTG2.E7EMT2"
[18] "Q7Z6Z7.Q7Z6Z7.3.Q7Z6Z7.2.H0Y5W0.Q5H962.H0Y659.H0Y7U1.Q5H963.A5YM72.2"
[19] "Q96QR8"
[20] "Q13526.Q49AR7"
[21] "P62256.C9JZY6.A4D1L6.C9JZG9.H7C4M9.C9J8Q9"
[22] "P36405"
[23] "O00264.B7Z1L3"
[24] "P04792.F8WE04.C9J3N8"
[25] "O14558"
[26] "P09211.A8MX94"
[27] "P01033.Q5H9A7.H0Y789.Q5H9B5.B4DJK3.Q5H9B4"
[28] "P53350.I3L387.I3L2H5"
[29] "Q96JB2.Q96JB2.2"
[30] "Q13409.2.F8W8S0.Q13409.Q13409.5.Q13409.3.Q13409.6.B7ZA04.E7EV09.E7EQL5.E7EMU4.E7ETL8.E7EQU2.E7ESD3.E7ET01.E7ERH4.E7ERR6.E9PGG1.E7EUM4.E7EU01"
[31] "G5E9Q6.P35080.2.C9J0J7"
如何删除&#34;不重要的&#34;我的数据集中的功能?这个功能是否内置于Boruta?
我希望能够使用Boruta的所选功能对新数据集执行以下操作:
model <- randomForest(formula = Class ~ ., ntree = 500, mtry = 7, data = training)
model
# Predict and classify the test set
pred <- predict(model, newdata = testing)
table(pred, testing$Class)
修改的
我试过
data2 <- data[,getSelectedAttributes(boruta.train)]
削减我的数据集,但在这个过程中,我丢失了我的第一个专栏&#34; Sample&#34;我曾用它来分类我的样品。最初我的数据集如下所示:
names(data)
[1] "Sample"
[2] "F8W9N1.Q9UHK6.D6RB81.Q9UHK6.2.Q9UHK6.4"
[3] "P24043"
[4] "B4DMU0.P32322.E2QRB3.A6NFM2.J3KQ22.J3QKT4.J3QL24.J3QL32.J3QLK9.J3QL23.J3QR88.J3QRZ0.J3QKT3.J3KTA8.J3KSA9"
[5] "Q53FZ2.H3BSM0.E7ETR5.Q53FZ2.2.H3BVD5.H3BV29.H3BT38.H3BUF2.H3BR33"
[6] "Q9Y666.H0YB78.Q9Y666.2"
我想保留&#34; Sample&#34;我使用了Boruta选择的功能之后的专栏。我试过这个,但输出不正确:
data2$Sample <- data[,getSelectedAttributes(boruta.train)]
修改的
我可以使用这个简单的修复
恢复Sample列data2$Sample <- data$Sample