我得到了一个关于病马的训练数据集,其中包含的数据是关于外科手术和疾病的。寄存器的一些字段如:马的温度,年龄,脉搏,呼吸率等....
我想在每一行的实时/死亡/安乐死列上做一个分类器。我被要求检查的是:
数据集中有25%的缺失值,并使用MIMMI插补法进行插补。
考虑到获得可靠概率的可能性,我可以看到训练数据集有点不平衡:179匹马活着,121匹死亡(死亡+安乐死)。但我真的不确定。对这两个问题的任何帮助对我都非常有帮助。
=== Run information ===
Scheme:weka.classifiers.bayes.NaiveBayes
Relation: horseColic-weka.filters.unsupervised.attribute.Remove-R25-27
Instances: 300
Attributes: 24
surgery
age
id
temp
pulse
respRate
tempExtrem
periPulse
mucMemb
capRefT
pain
peri
abdDist
ngTube
ngReflux
ngRPH
feces
abd
pCellVol
totProt
abdCentApp
abdCentTotProt
outc
surgLes
Test mode:10-fold cross-validation
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute lived died euthanized
(0.59) (0.26) (0.15)
==================================================================
surgery
yes 97.0 59.0 28.0
no 84.0 20.0 18.0
[total] 181.0 79.0 46.0
age
adult 168.0 67.0 44.0
young 13.0 12.0 2.0
[total] 181.0 79.0 46.0
id
mean 1009274.0202 1452556.3598 751596.8611
std. dev. 1431022.1677 1887025.7703 989556.6807
weight sum 179 77 44
precision 16915.735 16915.735 16915.735
temp
mean 34.8733 35.0055 33.054
std. dev. 10.2335 13.0545 14.9588
weight sum 179 77 44
precision 0.9275 0.9275 0.9275
pulse
mean 29.2039 33.2115 29.0187
std. dev. 10.8578 14.6404 16.7248
weight sum 179 77 44
precision 0.9107 0.9107 0.9107
respRate
mean 15.0771 16.9169 15.9348
std. dev. 8.9803 7.0278 8.1221
weight sum 179 77 44
precision 0.8667 0.8667 0.8667
tempExtrem
normal 82.0 16.0 12.0
warm 36.0 7.0 3.0
cool 53.0 48.0 25.0
cold 12.0 10.0 8.0
[total] 183.0 81.0 48.0
periPulse
normal 133.0 22.0 11.0
increased 5.0 8.0 7.0
reduced 43.0 47.0 25.0
absent 2.0 4.0 5.0
[total] 183.0 81.0 48.0
mucMemb
normal-pink 95.0 9.0 7.0
bright-pink 23.0 13.0 6.0
pale-pink 37.0 19.0 12.0
pale-cyanotic 16.0 17.0 12.0
bright-red 7.0 14.0 8.0
dark-cyanotic 7.0 11.0 5.0
[total] 185.0 83.0 50.0
capRefT
short 153.0 46.0 23.0
long 28.0 33.0 23.0
long2 1.0 1.0 1.0
[total] 182.0 80.0 47.0
pain
no-pain 53.0 6.0 8.0
depressed 42.0 21.0 14.0
inte-mild-pain 64.0 10.0 8.0
inte-severe-pain 12.0 18.0 12.0
cont-severe-pain 13.0 27.0 7.0
[total] 184.0 82.0 49.0
peri
hypermotile 42.0 7.0 7.0
normal 22.0 8.0 5.0
hypomotile 90.0 37.0 17.0
absent 29.0 29.0 19.0
[total] 183.0 81.0 48.0
abdDist
none 88.0 17.0 13.0
slight 53.0 18.0 8.0
moderate 28.0 30.0 14.0
severe 14.0 16.0 13.0
[total] 183.0 81.0 48.0
ngTube
none 79.0 40.0 27.0
slight 90.0 32.0 15.0
significant 13.0 8.0 5.0
[total] 182.0 80.0 47.0
ngReflux
none 149.0 50.0 30.0
much 17.0 15.0 6.0
less 16.0 15.0 11.0
[total] 182.0 80.0 47.0
ngRPH
mean 11.3797 13.0882 8.0606
std. dev. 2.3535 3.2916 5.1673
weight sum 179 77 44
precision 0.7917 0.7917 0.7917
feces
normal 77.0 14.0 10.0
increased 16.0 14.0 8.0
decreased 44.0 15.0 11.0
absent 46.0 38.0 19.0
[total] 183.0 81.0 48.0
abd
normal 48.0 13.0 4.0
other 39.0 5.0 7.0
firm-large-intestine 18.0 8.0 6.0
dist-small-intest 32.0 24.0 8.0
distended-large-intest 47.0 32.0 24.0
[total] 184.0 82.0 49.0
pCellVol
mean 31.0162 47.0465 46.0112
std. dev. 14.1207 18.5468 17.672
weight sum 179 77 44
precision 0.9518 0.9518 0.9518
totProt
mean 42.6539 41.451 43.7936
std. dev. 16.9138 18.6362 19.3247
weight sum 179 77 44
precision 0.9432 0.9432 0.9432
abdCentApp
clear 112.0 25.0 10.0
cloudy 54.0 22.0 20.0
serosanguinous 16.0 33.0 17.0
[total] 182.0 80.0 47.0
abdCentTotProt
mean 16.1341 21.1634 14.3203
std. dev. 6.8038 4.9109 8.6619
weight sum 179 77 44
precision 0.8837 0.8837 0.8837
surgLes
yes 94.0 70.0 30.0
no 87.0 9.0 16.0
[total] 181.0 79.0 46.0
Time taken to build model: 0.01 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 216 72 %
Incorrectly Classified Instances 84 28 %
Kappa statistic 0.5134
Mean absolute error 0.1965
Root mean squared error 0.3803
Relative absolute error 52.8451 %
Root relative squared error 88.2672 %
Total Number of Instances 300
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.777 0.198 0.853 0.777 0.813 0.873 lived
0.675 0.175 0.571 0.675 0.619 0.871 died
0.568 0.082 0.543 0.568 0.556 0.824 euthanized
Weighted Avg. 0.72 0.175 0.735 0.72 0.725 0.865
=== Confusion Matrix ===
a b c <-- classified as
139 28 12 | a = lived
16 52 9 | b = died
8 11 25 | c = euthanized
答案 0 :(得分:0)
朴素贝叶斯有一个突出的假设,即所有属性都是独立的。意味着在这种情况下,年龄,手术,温度被认为是相互独立的。但情况可能并非如此,并且在许多情况下并非如此。然而朴素贝叶斯通常会在很少训练的情况下获得不错的结果,但通常不如假设更正确的模型那么好。找到这些模型需要花费时间和精力,而Naive Bayes模型通常会达到足够的准确度。不确定您的样本量,您必须查看数据集的统计功效。