我有一个训练数据集和相应的标签,如下所示:
数据X阵列
X =[ [1,0,1,1,1,0 ],
[1,0,1,0,1,0 ],
[1,0,0,1,1,1 ],
[1,0,1,1,1,0 ],
[0,0,0,1,1,1 ],
[1,0,0,1,0,1 ],
[0,1,0,1,1,0 ],
[1,0,0,1,1,1 ]]
标签Y
Y= [ ['YES'],
['NO'],
['YES'],
['YES'],
['YES'],
['NO'],
['YES'],
['NO'],]
我想平衡数据的类,我想根据X
的标签对数据Y=YES
进行子集化。由于数据已经是随机采样的,所以我只想选择给定Y=YES
的前3行X,而Y= NO
保持相同的行,所以数据X和I子集的标签应该是这样的:
Sub_X = [ [1,0,1,1,1,0 ],
[1,0,1,0,1,0 ],
[1,0,0,1,1,1 ],
[1,0,1,1,1,0 ],
[1,0,0,1,0,1 ],
[1,0,0,1,1,1 ]]
子集化后的标签
Sub_Y = [ ['YES'],
['NO'],
['YES'],
['YES'],
['NO'],
['NO'],]
我编写代码的主要想法如下,
Sub_X = X[Y=='YES'][:3,]
Sub_Y = Y[Y=='YES'][:3,]
我的想法是根据标签过滤X数据,然后选择前3行,但问题是标签=' NO'也应该包括原件。有谁能给我解决这个问题的想法?