我在尝试通过rpart
创建决策树时遇到问题,需要花费太多时间才能完成。
我不确定是否需要减少给定数据集的任何特征中的维度或因子。
您可以在下方找到数据集中的head
和str
。这也是它的link。
Funct.Area Environment ServiceType Ticket.Nature SLA.Result..4P. IRIS.Priority
2 FUN DCF FUN SR OK Priority 3
5 APS DCF APS SR Defect Priority 3
7 SEC DCF SEC SR OK Priority 4
8 SEC DCF SEC SR Defect Priority 4
9 FUN DCF FUN SR OK Priority 3
10 SEC DCF SEC SR OK Priority 3
'data.frame': 69250 obs. of 6 variables:
$ Funct.Area : Factor w/ 27 levels "0","812","APS",..: 13 3 26 26 13 26 26 26 26 26 ...
$ Environment : Factor w/ 29 levels " WS","812","BULK",..: 9 9 9 9 9 9 9 9 9 9 ...
$ ServiceType : Factor w/ 21 levels "APS","BULK","CNC",..: 8 1 18 18 8 18 18 18 18 18 ...
$ Ticket.Nature : Factor w/ 5 levels "BULK","CHG","HK",..: 5 5 5 5 5 5 5 5 5 5 ...
$ SLA.Result..4P.: Factor w/ 5 levels "#¡REF!","#N/A",..: 5 3 5 3 5 5 5 5 5 5 ...
$ IRIS.Priority : Factor w/ 4 levels "Priority 1","Priority 2",..: 3 3 4 4 3 3 3 3 4 4 ...
我的理解是rpart包可以处理分类变量,直到32个不同因素。
有没有办法减少处理时间?
以下是R脚本的link