Question

我正在与AWS SageMaker合作，并尝试以电影类型预测为例来重新创建自己的模型。

那是代码：

def split(df, test_size):
    data = df.values
    data_y = df.drop(['luogo', 'testo', 'lingua'], axis=1).values
    #StratifiedShuffleSplit does not work with one hot encoded / multiple 
    labels. Doing the split on basis of arg max labels.
    data_y = np.argmax(data_y, axis=1)
    data_y.shape
    stratified_split = StratifiedShuffleSplit(n_splits=2, 
test_size=test_size, random_state=42)
    for train_index, test_index in stratified_split.split(data, data_y):
        train, test = df.iloc[train_index], df.iloc[test_index]
     return train, test

train, test = split(df, 0.33)
#Split the train further into train and validation
train, validation = split(train, 0.2)

这是数据框：

这是示例数据框：

这是错误：

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

如何修改我的df？

请注意，示例df在同一行中具有多个'1'。

Answer 1

似乎data_y中有一个类，当期望至少为2时，只有一个与之关联的条目。

请注意，示例df在同一行中具有多个'1'。

错误消息，它试图传达您的数据具有一个特定的类，该类仅具有一个数据条目（行），而不是其条目中仅具有一个功能（列）。

由于分层拆分的工作方式，它尝试拆分数据，以使其形成n_splits套数。为了做到这一点，每个类至少需要n_splits个条目，以便每个集合至少可以为该类获得一个条目。您将n_splits的值设置为2，但是由于您的类只有一个条目，因此无法将一个条目分为两组。

解决方案是为该类添加更多数据，或删除该类的数据。

AWS SageMaker：ValueError：y中人口最少的类只有一个成员错误

1 个答案: