我已开始使用以下数据框实现多项式朴素贝叶斯:
Disc Bus Dep Edu
0 1 2 2 1
1 0 1 1 1
2 1 2 1 4
3 0 1 1 1
4 0 2 1 3
我也将其分为训练/测试
X = data_rev.drop('Disc', axis = 1)
y = data_rev['Disc']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 21)
然后我开始计算概率: 1.计算每个类别的先验对数概率(使用np.unique返回数组中排序的唯一元素)
separated = [[x for x, t in zip(X_train, y_train) if t == c] for c in np.unique(y_train)]
count_sample = X_train.shape[0]
self.class_log_prior_ = [np.log(len(i) / count_sample) for i in separated]
count = np.array([np.array(i).sum(axis = 0) for i in separated])
但是以某种方式,它使我错了,说TypeError: cannot perform reduce with flexible type
。显然,在传递axis = 0
时,numpy检测到一个字符串(数据帧头),并且无法执行该操作。那是怎么回事?如何在count
操作中解决此问题?
第3步如下:
feature_log_prob_ = np.log(count / count.sum(axis = 1)[np.newaxis].T)
当然会引发错误,因为它正在调用count
。