我有多个Y变量,并且正在运行一个循环以创建多个模型。我必须创建一个具有所有系数的2 numpy数组。面对同样的错误。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
accuracy_logistic = np.ones(100,dtype = float)
model_log = []
y_pred_output = np.array([])
pred_coef = pd.DataFrame()
for i in range(0,100):
model_log = LogisticRegression(class_weight='balanced')
model_log.fit(X_train,y_train[:,i])
log_prediction = model_log.predict(X_test)
accuracy_logistic[i] = accuracy_score(y_test[:,i],log_prediction)
##Error inline below##
pred_coef = np.append(pred_coef, np.transpose(np.array(model_log.coef_)), axis= 0)
错误消息
ValueError Traceback (most recent call
---> 12 pred_coef = np.append(pred_coef, np.transpose(np.array(model_log.coef_)), axis= 0)
~/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py in append(arr, values, axis)
4526 values = ravel(values)
4527 axis = arr.ndim-1
-> 4528 return concatenate((arr, values), axis=axis)
ValueError: all the input arrays must have same number of dimensions
答案 0 :(得分:2)
也许我误解了您的目标,但我认为您的错误在于以下几行:
pred_coef = np.append(pred_coef, np.transpose(np.array(model_log.coef_)), axis= 0)
您已经创建了一个DataFrame pred_coef,因此似乎应该使用df.append功能。
pred_coef = pred_coef.append(pd.Series(model_log.coef_[0]), ignore_index=True)
这应该为您提供一个DataFrame,每一行都是给定y的系数。
编辑:@Alollz提出了一个很好的观点,即反复附加到DataFrame效率低下。只需在循环之前创建一个列表,而不是创建pred_coef DataFrame并向其附加系数,即可完成此操作。然后,您可以从列表中构建数据框。例如,
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)
accuracy_logistic = np.ones(y.shape[1],dtype = float)
model_log = []
y_pred_output = np.array([])
coef_list = []
for i in range(0,y.shape[1]):
model_log = LogisticRegression(class_weight='balanced')
model_log.fit(X_train,y_train[:,i])
log_prediction = model_log.predict(X_test)
accuracy_logistic[i] = accuracy_score(y_test[:,i],log_prediction)
coef_list.append(model_log.coef_[0])
pred_coef = pd.DataFrame(coef_list)