我正在使用Kmeans SMOTE技术对数据进行过采样。当我增加目标数量(多列/多个字段)时,它会出错。如果我仅传递一个目标(一列),则此技术有效。我想传递多个目标(多个列/多个字段)。该怎么做?
示例代码如下:
import numpy as np
from kmeans_smote import KMeansSMOTE
import pandas as pd
datasets = pd.read_csv('data.csv')
feature = datasets[datasets.columns[0:3]]
target = datasets[datasets.columns[3:]]
X, Y = feature, target
Y = Y.values
[print('Class {} has {} instances'.format(label, count))
for label, count in zip(*np.unique(Y, return_counts=True))]
kmeans_smote = KMeansSMOTE( kmeans_args={ 'n_clusters': 9 }, smote_args={ 'k_neighbors': 10 })
X_resampled, y_resampled = kmeans_smote.fit_sample(X, Y)
[print('Class {} has {} instances after oversampling'.format(label, count))
for label, count in zip(*np.unique(y_resampled, return_counts=True))]
print('y_resampled',y_resampled)
样本数据如下:
此处,“ F1,F2,F3,F4”是功能部件,“ C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15”是功能部件目标
数据:
F1 F2 F3 F4 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15
160202 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1
160578 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
162582 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1
160286 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
160290 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1
160204 0 1 -1 0 1 0 1 0 1 0 0 1 0 0 1 0 0 0
160298 0 1 -1 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1
160602 1 0 -1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
160206 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0
谢谢