下面是我为执行Logistic回归而构建的一些自定义代码。 我正在利用一个事实,即用于Logistic回归的得分函数和Fishers信息矩阵可以表示为增强数据矩阵的行的函数。
I。 h_b_x
此函数计算: mu
h_b_x<- function(beta,x)
{
if(length(beta) != length(x))
{return("Incompatible length of vectors")}
return((exp(sum(beta*x)))/(1 + exp(sum(beta*x))))
}
II。 得分功能
此函数计算: Score Function
score <- function(beta,X,y)
{
# n denotes the number of observations in the data matrix X
p <- dim(X)[2]
# providing initial value for the score function
sum_1 <- rep(0,p)
for (i in 1:n)
{
sum_1 <- sum_1 + (y[i] - h_b_x(beta,X[i,])) * X[i,]
}
# sum_1 is a vector of length p
return(sum_1)
}
III。 渔民信息矩阵
此函数计算:Fishers information matrix
Fishers_matrix <- function(beta,X,y)
{
# n denotes the number of observations in the data matrix X
n <- dim(X)[1]
# p denotes the number of parameters chosen for the logistic regression model
p <- dim(X)[2]
# providing initial value for the Fishers Information Matrix
sum_1 <- matrix(rep(0,p*p), nrow = p, ncol = p)
for (i in 1:n)
{
sum_1 <- sum_1 + (((X[i,] %*% t(X[i,])) * h_b_x(beta,X[i,]) * ( 1 - h_b_x(beta,X[i,]))))
}
# sum_1 is a matrix of length p * p
return(sum_1)
}
IV。具有Titanic数据集的EDA
这部分代码将允许您复制我执行的EDA:
library(titanic)
data("titanic_train")
data("titanic_test")
library(dplyr)
titanic_test$Survived <- 2
complete_data <- rbind(titanic_train, titanic_test)
complete_data$Embarked[complete_data$Embarked==""] <- "S"
complete_data$Age[is.na(complete_data$Age)] <- median(complete_data$Age,na.rm=S)
complete_data <- as.data.frame(complete_data)
titanic_data <- select(complete_data,-c(Cabin, PassengerId, Ticket, Name))
titanic_data <- titanic_data[!titanic_data$Survived == "2", ]
y_vals <- titanic_data$Survived
x <- as.data.frame(titanic_data[,-1])
X <- model.matrix(y_vals~.,data = x)
# Specifying intial value for beta
beta <- as.numeric(rep(0.01, dim(X)[2]))
V。我面临的问题
我已将数据分为3部分-
X_1 <- X[1:297,]
X_2 <- X[298:594,]
X_3<- X[595:891,]
y_1 <- y_vals[1:297]
y_2 <- y_vals[298:594]
y_3 <- y_vals[595:891]
F_1 <- Fishers_matrix(beta,X_1,y_1)
F_2 <- Fishers_matrix(beta,X_2,y_2)
F_3 <- Fishers_matrix(beta,X_3,y_3)
F_4 <- score(beta,X_1,y_1)
F_5 <- score(beta,X_2,y_2)
F_6 <- score(beta,X_3,y_3)
有趣的是,每一方持有的Fisher信息的总和等于基于汇总信息的Fisher信息。但是,对于得分函数而言,情况并非如此。 鉴于得分函数(如上图2所示)可以计算为数据行的总和(在调整常量之后),因此我很难理解为什么会出现这种情况。 < / p>
# Sum of each party's Fishers Info and Score function
Fisher_sum <- F_1 + F_2 + F_3
Score_sum <- F_4 + F_5 + F_6
# Fisher info and Score function of aggregate information
Fish_full <- Fishers_matrix(beta,X,y_vals)
score_full <- score(beta,X,y_vals)
sum (Fisher_sum - Fish_full)
sum(Score_sum - score_full )
我将不胜感激-我相信这里的代码有错误(但是,如果我的统计信息有误,我很乐意予以纠正。)