如何添加多个模型的回归系数?

时间:2019-01-26 08:17:15

标签: python pandas numpy logistic-regression

我有多个Y变量,并且正在运行一个循环以创建多个模型。我必须创建一个具有所有系数的2 numpy数组。面对同样的错误。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)    
accuracy_logistic = np.ones(100,dtype = float)
model_log = []
y_pred_output = np.array([])
    pred_coef = pd.DataFrame()
    for i in range(0,100):  

        model_log = LogisticRegression(class_weight='balanced')
        model_log.fit(X_train,y_train[:,i])
        log_prediction = model_log.predict(X_test)
        accuracy_logistic[i] = accuracy_score(y_test[:,i],log_prediction)

       ##Error inline below##

        pred_coef = np.append(pred_coef, np.transpose(np.array(model_log.coef_)), axis= 0)

错误消息


ValueError                                Traceback (most recent call 
---> 12     pred_coef = np.append(pred_coef, np.transpose(np.array(model_log.coef_)), axis= 0)

~/anaconda3/lib/python3.7/site-packages/numpy/lib/function_base.py in append(arr, values, axis)
   4526         values = ravel(values)
   4527         axis = arr.ndim-1
-> 4528     return concatenate((arr, values), axis=axis)

ValueError: all the input arrays must have same number of dimensions

1 个答案:

答案 0 :(得分:2)

也许我误解了您的目标,但我认为您的错误在于以下几行:

pred_coef = np.append(pred_coef, np.transpose(np.array(model_log.coef_)), axis= 0)

您已经创建了一个DataFrame pred_coef,因此似乎应该使用df.append功能。

pred_coef = pred_coef.append(pd.Series(model_log.coef_[0]), ignore_index=True)

这应该为您提供一个DataFrame,每一行都是给定y的系数。

编辑:@Alollz提出了一个很好的观点,即反复附加到DataFrame效率低下。只需在循环之前创建一个列表,而不是创建pred_coef DataFrame并向其附加系数,即可完成此操作。然后,您可以从列表中构建数据框。例如,

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)    
accuracy_logistic = np.ones(y.shape[1],dtype = float)
model_log = []
y_pred_output = np.array([])
coef_list = []

for i in range(0,y.shape[1]):  
    model_log = LogisticRegression(class_weight='balanced')
    model_log.fit(X_train,y_train[:,i])
    log_prediction = model_log.predict(X_test)
    accuracy_logistic[i] = accuracy_score(y_test[:,i],log_prediction)
    coef_list.append(model_log.coef_[0])

pred_coef = pd.DataFrame(coef_list)