使用numpy执行.sum()操作时跳过字符串

时间:2019-03-27 03:10:14

标签: python arrays pandas numpy naivebayes

我已开始使用以下数据框实现多项式朴素贝叶斯:

   Disc Bus Dep Edu
0   1   2   2   1   
1   0   1   1   1   
2   1   2   1   4   
3   0   1   1   1   
4   0   2   1   3

我也将其分为训练/测试

X = data_rev.drop('Disc', axis = 1)

y = data_rev['Disc']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 21)

然后我开始计算概率: 1.计算每个类别的先验对数概率(使用np.unique返回数组中排序的唯一元素)

separated = [[x for x, t in zip(X_train, y_train) if t == c] for c in np.unique(y_train)]

count_sample = X_train.shape[0]

self.class_log_prior_ = [np.log(len(i) / count_sample) for i in separated]

  1. 计算每个班级的每次发生次数

count = np.array([np.array(i).sum(axis = 0) for i in separated])

但是以某种方式,它使我错了,说TypeError: cannot perform reduce with flexible type。显然,在传递axis = 0时,numpy检测到一个字符串(数据帧头),并且无法执行该操作。那是怎么回事?如何在count操作中解决此问题?

第3步如下:

feature_log_prob_ = np.log(count / count.sum(axis = 1)[np.newaxis].T)当然会引发错误,因为它正在调用count

0 个答案:

没有答案