Question

我得到了一个关于病马的训练数据集，其中包含的数据是关于外科手术和疾病的。寄存器的一些字段如：马的温度，年龄，脉搏，呼吸率等....

我想在每一行的实时/死亡/安乐死列上做一个分类器。我被要求检查的是：

考虑变量独立性的假设
检查我是否获得了许多元素以获得可靠的概率

数据集中有25％的缺失值，并使用MIMMI插补法进行插补。

考虑到获得可靠概率的可能性，我可以看到训练数据集有点不平衡：179匹马活着，121匹死亡（死亡+安乐死）。但我真的不确定。对这两个问题的任何帮助对我都非常有帮助。

=== Run information ===

Scheme:weka.classifiers.bayes.NaiveBayes 
Relation:     horseColic-weka.filters.unsupervised.attribute.Remove-R25-27
Instances:    300
Attributes:   24
              surgery
              age
              id
              temp
              pulse
              respRate
              tempExtrem
              periPulse
              mucMemb
              capRefT
              pain
              peri
              abdDist
              ngTube
              ngReflux
              ngRPH
              feces
              abd
              pCellVol
              totProt
              abdCentApp
              abdCentTotProt
              outc
              surgLes
Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Naive Bayes Classifier

                                  Class
Attribute                         lived         died   euthanized
                                 (0.59)       (0.26)       (0.15)
==================================================================
surgery
  yes                               97.0         59.0         28.0
  no                                84.0         20.0         18.0
  [total]                          181.0         79.0         46.0

age
  adult                            168.0         67.0         44.0
  young                             13.0         12.0          2.0
  [total]                          181.0         79.0         46.0

id
  mean                      1009274.0202 1452556.3598  751596.8611
  std. dev.                 1431022.1677 1887025.7703  989556.6807
  weight sum                         179           77           44
  precision                    16915.735    16915.735    16915.735

temp
  mean                           34.8733      35.0055       33.054
  std. dev.                      10.2335      13.0545      14.9588
  weight sum                         179           77           44
  precision                       0.9275       0.9275       0.9275

pulse
  mean                           29.2039      33.2115      29.0187
  std. dev.                      10.8578      14.6404      16.7248
  weight sum                         179           77           44
  precision                       0.9107       0.9107       0.9107

respRate
  mean                           15.0771      16.9169      15.9348
  std. dev.                       8.9803       7.0278       8.1221
  weight sum                         179           77           44
  precision                       0.8667       0.8667       0.8667

tempExtrem
  normal                            82.0         16.0         12.0
  warm                              36.0          7.0          3.0
  cool                              53.0         48.0         25.0
  cold                              12.0         10.0          8.0
  [total]                          183.0         81.0         48.0

periPulse
  normal                           133.0         22.0         11.0
  increased                          5.0          8.0          7.0
  reduced                           43.0         47.0         25.0
  absent                             2.0          4.0          5.0
  [total]                          183.0         81.0         48.0

mucMemb
  normal-pink                       95.0          9.0          7.0
  bright-pink                       23.0         13.0          6.0
  pale-pink                         37.0         19.0         12.0
  pale-cyanotic                     16.0         17.0         12.0
  bright-red                         7.0         14.0          8.0
  dark-cyanotic                      7.0         11.0          5.0
  [total]                          185.0         83.0         50.0

capRefT
  short                            153.0         46.0         23.0
  long                              28.0         33.0         23.0
  long2                              1.0          1.0          1.0
  [total]                          182.0         80.0         47.0

pain
  no-pain                           53.0          6.0          8.0
  depressed                         42.0         21.0         14.0
  inte-mild-pain                    64.0         10.0          8.0
  inte-severe-pain                  12.0         18.0         12.0
  cont-severe-pain                  13.0         27.0          7.0
  [total]                          184.0         82.0         49.0

peri
  hypermotile                       42.0          7.0          7.0
  normal                            22.0          8.0          5.0
  hypomotile                        90.0         37.0         17.0
  absent                            29.0         29.0         19.0
  [total]                          183.0         81.0         48.0

abdDist
  none                              88.0         17.0         13.0
  slight                            53.0         18.0          8.0
  moderate                          28.0         30.0         14.0
  severe                            14.0         16.0         13.0
  [total]                          183.0         81.0         48.0

ngTube
  none                              79.0         40.0         27.0
  slight                            90.0         32.0         15.0
  significant                       13.0          8.0          5.0
  [total]                          182.0         80.0         47.0

ngReflux
  none                             149.0         50.0         30.0
  much                              17.0         15.0          6.0
  less                              16.0         15.0         11.0
  [total]                          182.0         80.0         47.0

ngRPH
  mean                           11.3797      13.0882       8.0606
  std. dev.                       2.3535       3.2916       5.1673
  weight sum                         179           77           44
  precision                       0.7917       0.7917       0.7917

feces
  normal                            77.0         14.0         10.0
  increased                         16.0         14.0          8.0
  decreased                         44.0         15.0         11.0
  absent                            46.0         38.0         19.0
  [total]                          183.0         81.0         48.0

abd
  normal                            48.0         13.0          4.0
  other                             39.0          5.0          7.0
  firm-large-intestine              18.0          8.0          6.0
  dist-small-intest                 32.0         24.0          8.0
  distended-large-intest            47.0         32.0         24.0
  [total]                          184.0         82.0         49.0

pCellVol
  mean                           31.0162      47.0465      46.0112
  std. dev.                      14.1207      18.5468       17.672
  weight sum                         179           77           44
  precision                       0.9518       0.9518       0.9518

totProt
  mean                           42.6539       41.451      43.7936
  std. dev.                      16.9138      18.6362      19.3247
  weight sum                         179           77           44
  precision                       0.9432       0.9432       0.9432

abdCentApp
  clear                            112.0         25.0         10.0
  cloudy                            54.0         22.0         20.0
  serosanguinous                    16.0         33.0         17.0
  [total]                          182.0         80.0         47.0

abdCentTotProt
  mean                           16.1341      21.1634      14.3203
  std. dev.                       6.8038       4.9109       8.6619
  weight sum                         179           77           44
  precision                       0.8837       0.8837       0.8837

surgLes
  yes                               94.0         70.0         30.0
  no                                87.0          9.0         16.0
  [total]                          181.0         79.0         46.0



Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         216               72      %
Incorrectly Classified Instances        84               28      %
Kappa statistic                          0.5134
Mean absolute error                      0.1965
Root mean squared error                  0.3803
Relative absolute error                 52.8451 %
Root relative squared error             88.2672 %
Total Number of Instances              300     

=== Detailed Accuracy By Class ===

               TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
                 0.777     0.198      0.853     0.777     0.813      0.873    lived
                 0.675     0.175      0.571     0.675     0.619      0.871    died
                 0.568     0.082      0.543     0.568     0.556      0.824    euthanized
Weighted Avg.    0.72      0.175      0.735     0.72      0.725      0.865

=== Confusion Matrix ===

   a   b   c   <-- classified as
 139  28  12 |   a = lived
  16  52   9 |   b = died
   8  11  25 |   c = euthanized

Answer 1

朴素贝叶斯有一个突出的假设，即所有属性都是独立的。意味着在这种情况下，年龄，手术，温度被认为是相互独立的。但情况可能并非如此，并且在许多情况下并非如此。然而朴素贝叶斯通常会在很少训练的情况下获得不错的结果，但通常不如假设更正确的模型那么好。找到这些模型需要花费时间和精力，而Naive Bayes模型通常会达到足够的准确度。不确定您的样本量，您必须查看数据集的统计功效。

Weka机器学习 - 解释天真的贝叶斯

1 个答案: