如何应用Kmeans SMOTE方法对数据进行过采样?

时间:2019-03-15 10:24:29

标签: python-3.x k-means oversampling

我正在使用Kmeans SMOTE技术对数据进行过采样。当我增加目标数量(多列/多个字段)时,它会出错。如果我仅传递一个目标(一列),则此技术有效。我想传递多个目标(多个列/多个字段)。该怎么做?

示例代码如下:

import numpy as np
from kmeans_smote import KMeansSMOTE
import pandas as pd

datasets = pd.read_csv('data.csv')
feature = datasets[datasets.columns[0:3]]
target = datasets[datasets.columns[3:]]

X, Y = feature, target
Y = Y.values

[print('Class {} has {} instances'.format(label, count))
for label, count in zip(*np.unique(Y, return_counts=True))]

kmeans_smote = KMeansSMOTE( kmeans_args={ 'n_clusters': 9 }, smote_args={ 'k_neighbors': 10 })
X_resampled, y_resampled = kmeans_smote.fit_sample(X, Y)


[print('Class {} has {} instances after oversampling'.format(label, count))
for label, count in zip(*np.unique(y_resampled, return_counts=True))]

print('y_resampled',y_resampled)

样本数据如下:

此处,“ F1,F2,F3,F4”是功能部件,“ C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15”是功能部件目标

数据:

F1 F2 F3 F4 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15

160202 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1

160578 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

162582 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1

160286 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

160290 0 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1

160204 0 1 -1 0 1 0 1 0 1 0 0 1 0 0 1 0 0 0

160298 0 1 -1 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1

160602 1 0 -1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0

160206 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0

谢谢

0 个答案:

没有答案