使用 while 循环训练模型

时间:2021-07-01 22:09:36

标签: python machine-learning while-loop dataset

我试图迭代一些值,而我的数据集 S_train 的长度 <= 比某个给定的数字,在这种情况下是 11。 这是我目前所拥有的

S_new = train
T_new = test
mu_new = mu
mu_test_new = mu_test

while len(S_new) <= 11:
  ground_test =  T_new[target].values.tolist()
  acquisition_function = abs(mu_test - ground_test)
  max_item = np.argmax(acquisition_function) #step 3 : value in test set that maximizes the abs difference of the energy
  alpha_al = test.iloc[[max_item]]  #identify the minimum step in test set
  S_new = S_new.append(alpha_al)
  len(S_new)
  T_new = T_new.drop(test.index[max_item])
  len(T_new)

  gpr = GaussianProcessRegressor(
    # kernel is the covariance function of the gaussian process (GP)
    kernel=Normalization( # kernel equals to normalization -> normalizes a kernel using the cosine of angle formula, k_normalized(x,y) = k(x,y)/sqrt(k(x,x)*k(y,y))
        # graphdot.kernel.fix.Normalization(kernel), set kernel as marginalized graph kernel, which is used to calculate the similarity between 2 graphs
        # implement the random walk-based graph similarity kernel as Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. ICML
        Tang2019MolecularKernel()
    ),
    alpha=1e-4, # value added to the diagonal of the kernel matrix during fitting
    optimizer=True, # default optimizer of L-BFGS-B based on scipy.optimize.minimize
    normalize_y=True, # normalize the y values so taht the means and variance is 0 and 1, repsectively. Will be reversed when predicions are returned
    regularization='+', # alpha (1e-4 in this case) is added to the diagonals of the kernal matrix
     )
  
  start_time = time.time()
  gpr.fit(S_new.graphs, S_new[target], repeat=1, verbose=True) # Fitting train set as graphs (independent variable) with train[target] as dependient variable
  end_time = time.time()
  print("the total time consumption is " + str(end_time - start_time) + ".")
 
  gpr.kernel.hyperparameters
  
  rmse_training = []
  rmse_test = []


  mu_new = gpr.predict(S_new.graphs)

  print('Training set')
  print('MAE:', np.mean(np.abs(S_new[target] - mu_new)))
  print('RMSE:', np.std(S_new[target] - mu_new))
  rmse_training.append(np.std(S_new[target] - mu_new)

  mu_test_new = gpr.predict(T_new.graphs)
  print('Training set')
  print('MAE:', np.mean(np.abs(T_new[target] - mu_test_new)))
  print('RMSE:', np.std(T_new[target] - mu_test_new))
  rmse_test.append(np.std(T_new[target] - mu_test_new)

基本上,我正在计算 T_new 中的值,该值使 T_new 和 mu_test 中的第 i 个元素之间的 abs 误差最大化,并将其添加到集合 S_train,然后将其从 T_new 中删除。 使用新的 S_train,我将再次训练我的模型,然后执行我上面解释的相同操作。 我从未使用过 while 循环,我正在寻找 sintaxis,对我来说看起来是正确的,但我收到此错误消息:

File "<ipython-input-55-d284ca5f9d1f>", line 42
    mu_test_new = gpr.predict(T_new.graphs)
              ^
SyntaxError: invalid syntax

你知道是什么原因造成的吗?任何建议都非常感谢。 永远感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

问题不在于while循环。这只是打字错误。特别是这条线 -

  rmse_training.append(np.std(S_new[target] - mu_new)

缺少右括号。
如果你尝试

  rmse_training.append(np.std(S_new[target] - mu_new))

您看到的错误将会消失。

值得注意的是,针对特定行报告的错误有时是由于之前的语法错误造成的,这是调试时需要注意的。