不允许使用负尺寸(Smoteenn)

时间:2019-01-29 08:47:39

标签: python machine-learning imblearn optuna

我使用失衡数据(2个类别为正:负= 9:1)

所以,我尝试使用SMOTEENN和optuna(optuna:找到合适的sample_strategy值)

但是,我得到了这个错误:

Setting trial status as TrialState.FAIL because of the following error: 

ValueError('negative dimensions are not allowed')

Traceback (most recent call last):
  File "D:\Anacon\envs\adisonax\lib\site-packages\optuna\study.py", line 409, in _run_trial
    result = func(trial)

  File "C:/Users/adisonax/Desktop/sotuken env/comprev/venv/No3.py", line 27, in objective
    X_res, y_res = sme.fit_resample(X_train_bow, Y_train)

  File "D:\Anacon\envs\adisonax\lib\site-packages\imblearn\base.py", line 85, in fit_resample
    output = self._fit_resample(X, y)

  File "D:\Anacon\envs\adisonax\lib\site-packages\imblearn\combine\_smote_enn.py", line 128, in _fit_resample
    X_res, y_res = self.smote_.fit_resample(X, y)

  File "D:\Anacon\envs\adisonax\lib\site-packages\imblearn\base.py", line 85, in fit_resample
    output = self._fit_resample(X, y)

  File "D:\Anacon\envs\adisonax\lib\site-packages\imblearn\over_sampling\_smote.py", line 796, in _fit_resample
    return self._sample(X, y)

  File "D:\Anacon\envs\adisonax\lib\site-packages\imblearn\over_sampling\_smote.py", line 814, in _sample
    X_class, nns, n_samples, 1.0)

  File "D:\Anacon\envs\adisonax\lib\site-packages\imblearn\over_sampling\_smote.py", line 108, in _make_samples
    low=0, high=len(nn_num.flatten()), size=n_samples)

  File "mtrand.pyx", line 994, in mtrand.RandomState.randint

  File "mtrand.pyx", line 995, in mtrand.RandomState.randint

  File "randint_helpers.pxi", line 202, in mtrand._rand_int32

ValueError: negative dimensions are not allowed

我的数据集大约有7000个句子。

这是X_train_bow:csr_matrix(元组类)的内容(7851,195)   数据= ndarray,dtype = float64

我在网上搜索,认为这是溢出(float64)。

所以,我尝试像这样将dtype更改为float32和float16,

X_train_bow_new = csr_matrix.astype(X_train_bow, dtype=np.float32, casting='same_kind')

但结果是相同的...

def objective(trial):

sampling_strategy = trial.suggest_discrete_uniform('sampling_strategy', 0.1, 1.0, 0.01)

sme = SMOTEENN(sampling_strategy=sampling_strategy)
X_res, y_res = sme.fit_resample(X_train_bow, Y_train)

mnb = MultinomialNB()
mnb.fit(X_res, y_res)
y_pred = mnb.predict(X_test_bow)
return 1.0 - accuracy_score(Y_test, y_pred)

new_vec = TfidfVectorizer(token_pattern=u'(?u)\\b\\w+\\b', norm='l2')

X_train_bow = new_vec.fit_transform(new_X_train)

X_test_bow = new_vec.transform(X_test)

study = optuna.create_study()

study.optimize(objective, n_trials=100)

如果进展顺利,将会是这样:

Finished a trial resulted in value: 0.2339449541284404. Current best value is 0.2339449541284404 with parameters: {'sampling_strategy': 0.54}.

Finished a trial resulted in value: 0.2884811416921509. Current best value is 0.2339449541284404 with parameters: {'sampling_strategy': 0.54}.

继续...

请教是否有人可以理解

谢谢

0 个答案:

没有答案