我研究了以下git repo (https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/glrm/glrm.census.labor.violations.ipynb)。它表明我们可以使用GLRM进行尺寸缩减。代码如下:
from h2o.estimators.glrm import H2OGeneralizedLowRankEstimator
acs_model = H2OGeneralizedLowRankEstimator(k = 10,
transform = "STANDARDIZE",
loss = "Quadratic",
regularization_x = "Quadratic",
regularization_y = "L1",
gamma_x = 0.25,
gamma_y = 0.5,
max_iterations = 100)
acs_model.train(x = acs_full.names, training_frame= acs_full)
print(acs_model)
zcta_arch_x = h2o.get_frame(acs_model._model_json["output"]["representation_name"])
zcta_arch_x.head()
该示例显示我们可以为表 acs_full 获得缩小的尺寸 zcta_arch_x 。假设 acs_full 是训练数据集。我们是否可以使用训练有素的模型 acs_model 来转换新的测试数据集,以减小测试数据集的维数?