尝试使用LightGBM训练模型时,出现以下警告:
/home/me/projects/programming/project_name/.project_name-env/lib/python3.8/site-packages/lightgbm/basic.py:1286:UserWarning: 从参考数据集中覆盖参数。 warnings.warn('从参考数据集中覆盖参数。')
/home/me/projects/programming/project_name/.project_name-env/lib/python3.8/site-packages/lightgbm/basic.py:1098: 用户警告:param dict中的categorical_column被覆盖。
warnings.warn(参数dict中的'{}被覆盖。'。format(cat_alias))
下面的代码的想法是:
代码:
# preprocess train data and create train dataset
X_train = df_train.drop(drop_columns, axis=1)
category_encoder = category_encoding.CategoryEncoder()
category_encoder.fit_transform(X_train, ['CAT0', 'CAT1'])
y_train = custom_encoding.custom_encode(df_train['LABEL_COL'].values)
train_dataset = lightgbm.Dataset(X_train, label=y_train, categorical_feature=None)
# same steps for validation data
X_valid = df_valid.drop(organizer_columns, axis=1)
category_encoder.transform(X_valid)
y_valid = custom_encoding.custom_encode(df_valid['LABEL_COL'].values)
valid_dataset = lightgbm.Dataset(X_valid, label=y_valid, reference=train_dataset, categorical_feature = None)
# create classifier parameter dictionary
model_params = {
'objective' : 'multiclass',
'num_classes' : 20,
'device_type' : 'GPU'
}
# create classifier
clf = lightgbm.train(
params = model_params,
train_set = train_dataset,
valid_sets = [valid_dataset],
verbose_eval = False,
num_boost_round = 50
)
我一辈子都看不到我要在验证数据集中的引用数据集中覆盖参数,还是在哪里覆盖类别列的任何内容。
我添加了“ categorical_features = None”以尝试消除错误(该错误在将其添加到train_dataset和valid_dataset之前就已经存在)。
如果任何人都可以解释lightgbm.train和lightgbm.dataset之间的适当划分,那也将很有帮助。
如果任何人都可以在train函数中解释参数字典和命名参数之间的正确划分,那也将非常有帮助!