我有一套1000行的火车(例如每天1行)。 我得到一组5个期货的预测(model.predict)。 在接下来的5天里,我实际上获得了接下来5天的数据(数字(例如销售))。 现在,我希望模型在这5个实际现实数据点上进行训练,而不是在(1005行,即1000个原始行和5个新行)上进行训练。
可以做到这一点。很抱歉出现“基本”问题,感谢所有帮助(包括链接,如果已经回答的话)。
import h2o
from h2o.automl import H2OAutoML
import pandas as pd
h2o.init()
data_path = "./df.csv"
df = h2o.import_file(data_path)
y = "c"
splits = df.split_frame(ratios = [0.8,0.19], seed = 1)
train = splits[0] #some part to train first
test = splits[1] # this is test set 1 (test later to become train set)
test2 = splits[2] # assume this to be the real world values
aml = H2OAutoML(max_runtime_secs=120,project_name='try', seed=1234)
aml.train(y = y, training_frame = train)
#First set of predictions
yy=aml.predict(test)
x=yy.as_data_frame(use_pandas=True) # predictions based on train set
#print them
print(x)
#the test set is now "new real world data"
#to be added as incremental training of the model
aml.train(y = y, training_frame = test)
#get the predictions again
yy=aml.predict(test2)
x=yy.as_data_frame(use_pandas=True)
print(x)
我试图重新训练“新数据集”(假设这是第30行所做的),但得到的数字却很奇怪。