print(x)'
“ x”是自变量。
Restaurant Cuisines Average_Cost Rating Votes Reviews Area
0 3.526361 0.693147 5.303305 1.504077 2.564949 1.609438 7.214504
1 1.386294 4.127134 4.615121 1.504077 2.484907 1.609438 5.905362
2 2.772589 1.386294 5.017280 1.526056 4.605170 3.433987 6.131226
3 3.912023 2.833213 5.525453 1.547563 5.176150 4.564348 7.643483
4 3.526361 2.708050 5.303305 1.435085 5.948035 5.046646 6.126869
... ... ... ... ... ... ... ...
11089 3.912023 0.693147 5.525453 1.648659 5.789960 5.046646 3.135494
11090 1.386294 6.028279 4.615121 1.526056 3.610918 2.833213 7.643483
11091 1.386294 2.397895 4.615121 1.504077 3.828641 2.944439 5.814131
11092 1.386294 6.028279 4.615121 1.410987 3.218876 2.302585 5.905362
11093 1.386294 6.028279 4.615121 1.029619 0.000000 0.000000 5.564520
11094 rows × 7 columns
`print(y.value_counts()) `
此处“ y”是目标变量,并且具有多个类别。
30 minutes 7406
45 minutes 2665
65 minutes 923
120 minutes 62
20 minutes 20
80 minutes 14
10 minutes 4
Name: Delivery_Time, dtype: int64
在研究了目标变量之后,我们可以看到“ 30分钟”课程在其他课程中的得分更高。
FOR FOR MAKING THINGS BALANCE I TRIED SMOTEtomek to oversamplemy data and make it balance. Below are the codes provide and got error.
from imblearn.combine import SMOTEtomek
smk = SMOTEtomek(ratio = 1)
x_res, y_res = smk.fit_sample(x,y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-426e8b86623d> in <module>()
1 from imblearn.combine import SMOTETomek
2 smk = SMOTETomek(ratio = 1)
----> 3 x_res, y_res = smk.fit_sample(x,y)
2 frames
/usr/local/lib/python3.6/dist-packages/imblearn/utils/_validation.py in _sampling_strategy_float(sampling_strategy, y, sampling_type)
311 if type_y != 'binary':
312 raise ValueError(
--> 313 '"sampling_strategy" can be a float only when the type '
314 'of target is binary. For multi-class, use a dict.')
315 target_stats = _count_class_sample(y)
ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.
答案 0 :(得分:1)
您只能看到Smote
的实际实现:
https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/imblearn/utils/_validation.py#L355
您只需要传递错误中提到的字典即可。但是SMOTE算法在内部负责多类设置。
要做:
from imblearn.oversampling import SMOTE
smote=SMOTE("minority")
X,Y=smote.fit_sample(x_train,y_train)
When dict, the keys correspond to the targeted classes. The
values correspond to the desired number of samples for each targeted
class.
答案 1 :(得分:0)
我认为您应该将目标变量保持在相同的比例,因为SMOTE可以为您提供更好的测试数据集,并且结果更好,但是该模型可能无法从用户输入的新数据(实时数据)中失败。 / p>
由您决定是否应用SMOTE。您可以使用以下代码:
from imblearn.oversampling import SMOTE
smote=SMOTE("minority")
X,Y=smote.fit_sample(x_train_data,y_train_data)