为了在将数据交给我的模型进行预测之前对其进行预处理,我使用了ColumnTransformer。但是我要使用的变压器已经安装好了,我现在要做出预测并且不想再训练了。我的代码:
name A B C ABC_rank
1 bell pepper 14 4 7 1
2 blood orange 14 2 8 2
3 blackberry 11 2 8 3
4 bilberry 12 1 5 4
5 apricot 10 1 7 5
6 blackcurrant 9 2 7 6
7 blueberry 11 1 6 7
8 apple 8 1 8 8
9 banana 11 1 3 9
10 avocado 7 0 8 10
features = pd.DataFrame(tokens)
num_attributes = ['frequency', 'relative_frequency', 'relative_first_occurrence']
cat_attributes = ["pos_tag"]
# reload already determined encoder
with open(ENCODER_SVM, 'rb') as f:
cat_encoder = pickle.load(f)
with open(SCALER_SVM, 'rb') as f:
scaler = pickle.load(f)
full_pipeline = ColumnTransformer([
("num", scaler, num_attributes),
("cat", cat_encoder, cat_attributes),
])
# transform categories and number attributes
# don't need to fit the data, because the objects were already fit,
# that is why they are loaded via pickle
transformed = full_pipeline.transform(features)
是适合的scaler
,而StandardScaler
是适合的cat_encoder
。我只是加载持久化的python对象并重新使用它们。因为它们已经适合,所以我调用OneHoteEncoder
而不是transform()
方法。由于某种原因,fit_transform()
必须执行某种调整或执行任何其他操作,因为我得到了错误:
ColumnTransformer
但是,当调用NotFittedError: This ColumnTransformer instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
时,转换器似乎又重新安装了,因此使python对象的重新加载变得无用。这看起来很烦人。大多数机器学习教程都以某种方式只是教会您如何训练模型以及如何忽略测试和预测阶段,至少是实际实现。有人可以帮我吗?