Question

我想在Python中实现AdaBoost算法。

我在列表中有一个名为classifiers的弱分类器我有向量_D，其中包含当前迭代的分布值我的代码看起来像这样（向量是numpy数组）：

for t in range(m):
  chosen_examples_indexes = []
  for i, d in enumerate(_D):
    if np.random.binomial(1, d) == 1:
      chosen_examples_indexes.append(i)
    training_examples = examples[chosen_examples_indexes]

问题是_D的使用是否正确。如果不是，那么正确的用法是什么？

Answer 1

AdaBoost在其核心版本中使用分布D来权重您的样本，而不是对它们进行采样（尽管在极限情况下这几乎是相同的，在固定长度情况下这是完全不同的）。

因此，在您的符号中，您只需执行

X, y = # get training data from examples
for t in range(m):
    classifiers[t].fit(X, y, sample_weights=_D)
    # update _D based on results

如果您的分类器不支持sample_weights，则问题更多，并且解决方案或多或少与您提供的一样，但是如果可能的话，加权应始终首选，因为它是直接优化我们感兴趣的是什么，而不是随机近似（来自抽样）。

for t in range(m):
    chosen_examples_indexes = np.array([np.random.random() < _d for _d in _D])
    training_examples = examples[chosen_examples_indexes]
    X, y = # get training data from training_examples
    classifiers[t].fit(X, y)
    # update _D based on results

AdaBoost - 如何使用分发D

1 个答案: