Question

我正在基于一个非常多样化的数据集进行基于朴素贝叶斯的预测。 str看起来像这样；

'data.frame':   1244 obs. of  24 variables:
 $ Opportunity.ID           : chr  "006D000000YuMhG" "0065700000xKQDI" "0065700000xp0Tq" "0065700000xpxs3" ...
 $ Stage                    : Factor w/ 2 levels "Closed Lost",..: 1 1 2 2 2 2 2 2 1 1 ...
 $ Opportunity.Owner        : Factor w/ 26 levels "ABA","ALE","BAD",..: 19 7 19 1 17 11 1 7 11 13 ...
 $ Solution.Type            : Factor w/ 4 levels "","Hybrid","MCS",..: 4 3 4 4 4 4 4 4 4 4 ...
 $ New.Business             : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 2 1 ...
 $ Delivery.Countries       : Factor w/ 5 levels "India","Netherlands",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Presales.Owner           : Factor w/ 23 levels "","ABA","AVR",..: 1 1 1 1 9 1 1 1 1 1 ...
 $ Age                      : int  2604 36 13 2 30 71 1 0 11 396 ...
 $ Days.in.current.stage    : int  425 425 427 429 428 427 427 426 422 415 ...
 $ Days.in.previous.stage   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Previous.stage           : Factor w/ 10 levels "","Advise and Design Solution",..: 5 5 8 8 8 8 5 8 5 5 ...
 $ Industry                 : Factor w/ 39 levels "Agriculture",..: 12 38 26 22 5 26 6 39 7 6 ...
 $ Account.Created.Date     : Date, format: "2014-04-30" "2014-04-30" "2014-04-30" "2014-04-30" ...
 $ Total.Revenue.Converted  : num  0 0 1705 -3360 27596 ...
 $ Account.Type             : Factor w/ 7 levels "","Customer",..: 6 2 2 2 2 2 2 5 6 2 ...
 $ Billing.City             : chr  "hengelo" "gouda" "appingedam" "rotterdam" ...
 $ Shipping.City            : chr  "hengelo" "gouda" "appingedam" "rotterdam" ...
 $ Total.Opportunities      : int  2 21 84 36 27 5 5 19 2 25 ...
 $ Won.Opportunity.Count    : int  0 14 55 26 18 3 5 18 0 15 ...
 $ Number.Live.Opportunities: int  0 1 1 4 4 0 0 0 0 2 ...
 $ First.Order.Date         : Date, format: NA "2013-12-12" "2012-02-10" "2013-12-06" ...
 $ Last.Order.Date          : Date, format: NA "2020-02-04" "2020-02-27" "2019-08-23" ...
 $ Primary.Campaign.Source  : Factor w/ 114 levels "","600 Minutes Public IT 2012",..: 4 1 1 1 1 1 1 1 1 1 ...
 $ First.Campaign.Touch     : Factor w/ 81 levels "","600 min.Executive IT'11",..: 1 1 1 1 1 1 1 1 1 1 ...

尽管这是程序的第一个原型版本，但一切工作都很好，并且该模型在预测舞台时已经非常准确。我只有一件事无法动摇...

如何确定对实际预测影响最大的变量？我在这个主题上进行了很多搜索，但是我看到的大多数示例都是基于不同的算法，或者是基于数值数据集。如何确定此混合数据集中的重要变量？

确定朴素贝叶斯预测的相关变量

0 个答案: