比较采样器结合过采样和欠采样 ERROR

时间:2021-07-12 19:11:32

标签: python numpy machine-learning graph time-series

我正在尝试对我的过采样和 udnersampling 算法进行比较。

这是我的 y numpy 数组:

[0. 0. 0. ... 0. 0. 0.] 

这里有 1 和 0。 0 的百分比是 99.57113470676805 1s 的百分比 0.4288652932319543

这是我的 X numpy 数组:

[[ 9.99139870e+00  6.87505736e-01  8.18184694e-01  5.79211424e-03
   7.07254165e-02 -4.96940863e-02]
 [ 1.45842820e-02  8.90971353e-01  5.40819886e-02  4.78689597e-03
  -7.58403812e-01  1.25082521e-01]
 [ 1.45743243e-02  8.77439954e-01  3.24491931e-02  4.73968535e-03
  -5.17675263e-02 -5.86812372e-02]
 ...
 [ 1.81681846e-03  2.17873637e+00  7.85498395e-01  5.44274803e-04
  -4.03230077e-02  2.36304861e-02]
 [ 1.81637248e-03  2.22724182e+00  7.85498395e-01  5.74896405e-04
   2.43415000e-01 -2.68917605e-02]
 [ 1.81600743e-03  2.29634509e+00  7.85498395e-01  5.93269365e-04
   1.17457969e-01  1.15348925e-03]]

如上所示,有 6 个 X 功能,但错误是只有 2 个。我不知道在哪里可以修复此错误,以便图表正常工作。

This is what I am trying to measure:
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN, SMOTETomek
from imblearn.pipeline import make_pipeline
from sklearn.svm import LinearSVC

samplers = [SMOTE(random_state=0), SMOTEENN(random_state=0), SMOTETomek(random_state=0)]

fig, axs = plt.subplots(3, 2, figsize=(15, 25))
for ax, sampler in zip(axs, samplers):
    clf = make_pipeline(sampler, LinearSVC()).fit(X, y)
    plot_decision_function(X, y, clf, ax[0])
    plot_resampling(X, y, sampler, ax[1])
fig.tight_layout()

plt.show()


def plot_decision_function(X, y, clf, ax):
    """Plot the decision function of the classifier and the original data"""
    plot_step = 0.02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(
        np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step)
    )

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.4)
    ax.scatter(X[:, 0], X[:, 1], alpha=0.8, c=y, edgecolor="k")
    ax.set_title(f"Resampling using {clf[0].__class__.__name__}")
def plot_resampling(X, y, sampler, ax):
    """Plot the resampled dataset using the sampler."""
    X_res, y_res = sampler.fit_resample(X, y)
    ax.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.8, edgecolor="k")
    sns.despine(ax=ax, offset=10)
    ax.set_title(f"Decision function for {sampler.__class__.__name__}")
    return Counter(y_res)  

我得到的错误很简单,但我找不到修复它的方法:

ValueError: X has 2 features per sample; expecting 10

ValueError                                Traceback (most recent call last)
 in <module>
      9 for ax, sampler in zip(axs, samplers):
     10     clf = make_pipeline(sampler, LinearSVC()).fit(X, y)
---> 11     plot_decision_function(X, y, clf, ax[0])
     12     plot_resampling(X, y, sampler, ax[1])
     13 fig.tight_layout()

0 个答案:

没有答案