Question

我正在尝试使用添加的脆弱性参数和 L1 正则化来实现条件逻辑回归。

方程一是对数似然，方程二是对数似然的梯度。

其中 X^W_rh 是比赛 r 中获胜马 h 的特征

W_rh 是脆弱参数

X_rh 是马 h 在比赛 r 中的特征

lambda 是 L1 正则化参数

Beta 是权重

以下是我目前的实现。

def log_likelihood(features, weights, frailty, l1, groups):
    
    # Generate scores
    scores = np.dot(features, weights) + frailty
    
    # Calculate unregularized log-likelihood
    ll = np.sum(scores - np.log(np.sum(scores + frailty)))
    
    # Calculate log-likelihood with regularization
    reg_ll = ll - (l1 * np.sum(abs(weights)))
    
    return reg_ll

def conditional_log_reg(features, target, num_steps, learning_rate, groups, frailty, l1):
    
    weights = np.zeros(features.shape(1))
    
    for step in range(num_steps):
        scores = np.dot(features, weights)
        
        predictions = np.exp(scores + frailty)
        
        # Here we need to sum the upper half within the group
        upper_pred = np.dot(features, predictions)
        
        
        # Here we need to sum the lower half by the group
        lower_pred = predictions
        
        # Here we need to sum features by group
        gradient = features - (upper_pred / lower_pred) + l1
        
        weights += learning_rate * gradient 
        
        # Print log_likelihood every few steps
        
        if step % 10000 == 0:
            print(log_likelihood(features, target, frailty, l1, groups))
            
            
    return weights

我遇到的问题有两个：

我不确定真实变量或目标变量会去哪里，我的假设是 X^W_rh。然而，在 Silverman 博士的参考论文中，他将 X^W_rh 定义为 r 比赛中获胜马的特征。
我不知道如何使用 numpy 数组按组进行汇总。我熟悉使用 Pandas 按组进行汇总，但我试图避免在函数内转换为 Pandas df。

从头开始实现条件逻辑回归

0 个答案: