我正在尝试对我的过采样和 udnersampling 算法进行比较。
这是我的 y numpy 数组:
[0. 0. 0. ... 0. 0. 0.]
这里有 1 和 0。 0 的百分比是 99.57113470676805 1s 的百分比 0.4288652932319543
这是我的 X numpy 数组:
[[ 9.99139870e+00 6.87505736e-01 8.18184694e-01 5.79211424e-03
7.07254165e-02 -4.96940863e-02]
[ 1.45842820e-02 8.90971353e-01 5.40819886e-02 4.78689597e-03
-7.58403812e-01 1.25082521e-01]
[ 1.45743243e-02 8.77439954e-01 3.24491931e-02 4.73968535e-03
-5.17675263e-02 -5.86812372e-02]
...
[ 1.81681846e-03 2.17873637e+00 7.85498395e-01 5.44274803e-04
-4.03230077e-02 2.36304861e-02]
[ 1.81637248e-03 2.22724182e+00 7.85498395e-01 5.74896405e-04
2.43415000e-01 -2.68917605e-02]
[ 1.81600743e-03 2.29634509e+00 7.85498395e-01 5.93269365e-04
1.17457969e-01 1.15348925e-03]]
如上所示,有 6 个 X 功能,但错误是只有 2 个。我不知道在哪里可以修复此错误,以便图表正常工作。
This is what I am trying to measure:
from imblearn.over_sampling import SMOTE
from imblearn.combine import SMOTEENN, SMOTETomek
from imblearn.pipeline import make_pipeline
from sklearn.svm import LinearSVC
samplers = [SMOTE(random_state=0), SMOTEENN(random_state=0), SMOTETomek(random_state=0)]
fig, axs = plt.subplots(3, 2, figsize=(15, 25))
for ax, sampler in zip(axs, samplers):
clf = make_pipeline(sampler, LinearSVC()).fit(X, y)
plot_decision_function(X, y, clf, ax[0])
plot_resampling(X, y, sampler, ax[1])
fig.tight_layout()
plt.show()
def plot_decision_function(X, y, clf, ax):
"""Plot the decision function of the classifier and the original data"""
plot_step = 0.02
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(
np.arange(x_min, x_max, plot_step), np.arange(y_min, y_max, plot_step)
)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contourf(xx, yy, Z, alpha=0.4)
ax.scatter(X[:, 0], X[:, 1], alpha=0.8, c=y, edgecolor="k")
ax.set_title(f"Resampling using {clf[0].__class__.__name__}")
def plot_resampling(X, y, sampler, ax):
"""Plot the resampled dataset using the sampler."""
X_res, y_res = sampler.fit_resample(X, y)
ax.scatter(X_res[:, 0], X_res[:, 1], c=y_res, alpha=0.8, edgecolor="k")
sns.despine(ax=ax, offset=10)
ax.set_title(f"Decision function for {sampler.__class__.__name__}")
return Counter(y_res)
我得到的错误很简单,但我找不到修复它的方法:
ValueError: X has 2 features per sample; expecting 10
ValueError Traceback (most recent call last)
in <module>
9 for ax, sampler in zip(axs, samplers):
10 clf = make_pipeline(sampler, LinearSVC()).fit(X, y)
---> 11 plot_decision_function(X, y, clf, ax[0])
12 plot_resampling(X, y, sampler, ax[1])
13 fig.tight_layout()