Question

我有一个数据框（http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data）。

str（df）收益......

'data.frame':   699 obs. of  11 variables:
 $ Code #                     : int  1000025 1002945 1015425 1016277 1017023 1017122 1018099 1018561 1033078 1033078 ...
 $ Clump Thickness            : int  5 5 3 6 4 8 1 2 2 4 ...
 $ Uniformity of Cell Size    : int  1 4 1 8 1 10 1 1 1 2 ...
 $ Uniformity of Cell Shape   : int  1 4 1 8 1 10 1 2 1 1 ...
 $ Marginal Adhesion          : int  1 5 1 1 3 8 1 1 1 1 ...
 $ Single Epithelial Cell Size: int  2 7 2 3 2 7 2 2 2 2 ...
 $ Bare Nuclei                : int  1 10 2 4 1 10 10 1 1 1 ...
 $ Bland Chromatin            : int  3 3 3 3 3 9 3 3 1 2 ...
 $ Normal Nucleoli            : int  1 2 1 7 1 7 1 1 1 1 ...
 $ Mitoses                    : int  1 1 1 1 1 1 1 1 5 1 ...
 $ Class                      : Factor w/ 2 levels "2","4": 1 1 1 1 1 2 1 1 1 1 ...

我正在尝试对预测＆＃34; Class＆＃34;进行10交叉验证。 - 因子2是良性的，4是恶性的。

我已经将数据框分成了10个测试类，并使用具有朴素贝叶斯分类的predict（）函数来查找每个测试类的先验概率。

predict(nb, a, type = c("raw"))
nb = naive bayes classifier, a = first test class

以下是预测参考的前几个值：

                  2            4
  [1,]  1.000000e+00 3.671148e-09
  [2,]  1.390736e-19 1.000000e+00
  [3,]  1.000000e+00 1.238558e-09
  [4,]  1.459450e-24 1.000000e+00
  [5,]  1.000000e+00 9.585543e-09
  [6,]  2.451592e-75 1.000000e+00
  [7,]  1.379640e-03 9.986204e-01
  [8,]  1.000000e+00 7.171687e-10

我无法找到Benign（2）和Malignant（4）类的a-priori概率的平均值。如何平均这些列并打印值？

Answer 1

有一个非常有用的R函数，称为type Book { ... publisher: PublishersEnum @defaultValue(value: PEARSON) ... }。假设结果存储在对象colMeans中，那么

res

将为您提供所需的列方式。

Answer 2

假设您的输出是矩阵简单地

> b<-predict(nb, a, type = c("raw"))
> mean(b[,1])
[1] 0.5001725
> mean(b[,2])
[1] 0.4998276

只需将输出分配给变量，然后使用[,i]选择i列

希望这有帮助

r中的10倍交叉验证平均值

2 个答案: