Python:XGBoost多标签分类的数据不平衡

时间:2018-11-16 00:25:58

标签: machine-learning classification xgboost multilabel-classification

我有一个股票收益数据集,其中Y标签是价格变化方向(如果向上变动,则为2;如果向下变动,则为1;如果没有变动,则为0。某些特征X包括滞后)标签值(即前一天的价格方向变化)。

我正在尝试运行XGBoost分类模型,但是我的数据高度不平衡。大多数Y标签值= 0,表示股价没有变动。

如何将这种不平衡问题纳入多标签XGBoost分类问题中?

我的代码如下:

X = df[["ret_D_lag_1", "ret_D_lag_2", "ret_D_lag_3"]]
y = df["ret_D_t1"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)

# use DMatrix for xgboost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

# set xgboost params
param = {
    'max_depth': 3,  # the maximum depth of each tree
    'eta': 0.3,  # the training step for each iteration
    'silent': 1,  # logging mode - quiet
    'objective': 'multi:softprob',  # error evaluation for multiclass training
    'num_class': 3}  # the number of classes that exist in this datset
num_round = 20  # the number of training iterations

# Train the model
bst = xgb.train(param, dtrain, num_round)

# Predict and choose highest probability for each label
preds = bst.predict(dtest)
best_preds = np.asarray([np.argmax(line) for line in preds])

0 个答案:

没有答案