ValueError:y中填充最少的类只有1个成员,这个成员太少了。任何类的最小组数不能少于2

时间:2017-12-17 22:47:24

标签: python scikit-learn

当我尝试使用自己的csv文件运行此代码段时:

data_df = pd.read_csv("movies_genres_en.csv", delimiter='\t')

# split the data, leave 1/3 out for testing
data_x = data_df[['plot']].as_matrix()
data_y = data_df.drop(['title', 'plot', 'plot_lang'], axis=1).as_matrix()
stratified_split = StratifiedShuffleSplit(n_splits=2, test_size=0.33)
for train_index, test_index in stratified_split.split(data_x, data_y):
    x_train, x_test = data_x[train_index], data_x[test_index]
    y_train, y_test = data_y[train_index], data_y[test_index]

#data_x = overviews,
#data_y = values from all the genre types('Action','Adventure', 'Fantasy') (1 0 0 ...)
# transform matrix of plots into lists to pass to a TfidfVectorizer

train_x = [x[0].strip() for x in x_train.tolist()]
test_x = [x[0].strip() for x in x_test.tolist()]

我一直收到这个错误:

  

ValueError:y中填充最少的类只有1个成员,即   太少了。任何类的最小组数不能少   比2。

我的所有实例都比2更有价值 这是关于我的csv值的信息:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4643 entries, 0 to 4642
Data columns (total 20 columns):
title              4643 non-null object
Action             4643 non-null object
Adventure          4643 non-null object
Fantasy            4643 non-null object
Science Fiction    4643 non-null object
Crime              4643 non-null object
Drama              4643 non-null object
Thriller           4643 non-null object
Animation          4643 non-null object
Family             4643 non-null object
Western            4643 non-null object
Comedy             4643 non-null object
Romance            4643 non-null object
Horror             4643 non-null object
Mystery            4643 non-null object
History            4643 non-null object
War                4643 non-null object
Music              4643 non-null object
Documentary        4643 non-null object
overview           4639 non-null object
dtypes: object(20)
memory usage: 725.5+ KB

csv文件的前五行:

data_df.head():

here

0 个答案:

没有答案