Question

我正在开发自己的欠采样功能，因为imblearn不能完全适用于多标签分类（例如，它只接受一维y）。

我想遍历X和y，每隔2或3行删除一行，这些行是多数类的一部分。目标是减少多数类中行数的快速而肮脏的方法。

def undersample(X, y):
    counter = 0
    for index, row in y.itertuples():
        if row['rectangle_here'] == 0:
            counter += 1
            if counter > 3:
                counter = 0
                X.drop(index, inplace=True)
                y.drop(index, inplace=True)
    return X, y

但即使是少量的行（~30,000），它也会崩溃我的内核。

y就是这样的，只要存在f2或f3，就会出现f1

所以，让我们计算f1中0发生的次数，然后每隔3次删除0行：

                  f1      f2       f3
0                  0       0       0
1                  0       0       0
2                  0       0       0
3                  1       0       1
4                  0       0       0
5                  0       0       0
6                  0       0       0
7                  0       0       0
8                  0       0       0
9                  0       0       0

大熊猫中多标签不平衡数据集的欠采样

0 个答案: