Question

我有一个看起来像这样的CSV数据集：

df1 = pd.wide_to_long(df.reset_index(),
                      stubnames='Population',
                      i='index',
                      j='Year', 
                      sep=' ').reset_index(level=0, drop=True).reset_index()
print (df1.head(10))
   Year State  Population
0  1998     A     4534534
1  1998     B    23433242
2  1998     C     4534534
3  1998     D     4534534
4  1998     E     4534534
5  1998     F     7897998
6  1999     A    41534534
7  1999     B    34323423
8  1999     C    41534534
9  1999     D    41534534

其中class_label是我的class_label,image_location 1, /some/loc0 2, /some/loc1 0, /some/loc2 1 /some/loc4，而target是我对NN的输入。

我想使用数据加载器以某种方式将其分为训练集和测试集，并对CSV中的每个类进行分层采样（20个类，CSV中每个类4个image_location）。

在谷歌搜索时，我会看到一些熊猫和scikit-learn解决方案，因此，从类似以下内容开始：

images

我认为

为您提供了一个训练和测试拆分，我可以在sss = StratifiedShuffleSplit(df['event'], n_iter=1, test_size=0.2)之后与dataSet一起使用。但是，我不确定这是否是一个优雅的解决方案，我想知道是否有人可以指出正确的方向来解决这个问题。

列车测试拆分的分层抽样

0 个答案: