R:Fishers Information的自定义代码出错(逻辑回归)

时间:2019-01-25 00:31:40

标签: r logistic-regression

下面是我为执行Logistic回归而构建的一些自定义代码。 我正在利用一个事实,即用于Logistic回归的得分函数和Fishers信息矩阵可以表示为增强数据矩阵的行的函数。

I。 h_b_x

此函数计算: mu

h_b_x<- function(beta,x)
{
  if(length(beta) != length(x))
  {return("Incompatible length of vectors")}
  return((exp(sum(beta*x)))/(1 + exp(sum(beta*x))))
}

II。 得分功能

此函数计算: Score Function

score <- function(beta,X,y)
{
  # n denotes the number of observations in the data matrix X
  p <- dim(X)[2]

  # providing initial value for the score function
  sum_1 <- rep(0,p)

  for (i in 1:n) 
  {
    sum_1 <- sum_1 +  (y[i] - h_b_x(beta,X[i,])) * X[i,]
  }

  # sum_1 is a vector of length p 
  return(sum_1)
}

III。 渔民信息矩阵

此函数计算:Fishers information matrix

Fishers_matrix <- function(beta,X,y)
{
  # n denotes the number of observations in the data matrix X
  n <- dim(X)[1]

  # p denotes the number of parameters chosen for the logistic regression model
  p <- dim(X)[2]

  # providing initial value for the Fishers Information Matrix
  sum_1 <- matrix(rep(0,p*p), nrow = p, ncol = p)

  for (i in 1:n) 
  {
    sum_1 <- sum_1 +  (((X[i,] %*% t(X[i,])) * h_b_x(beta,X[i,]) * ( 1 - h_b_x(beta,X[i,]))))
  }

  # sum_1 is a matrix of length p * p 
  return(sum_1)
}

IV。具有Titanic数据集的EDA

这部分代码将允许您复制我执行的EDA:

library(titanic)
data("titanic_train")
data("titanic_test")


library(dplyr)
titanic_test$Survived <- 2
complete_data <- rbind(titanic_train, titanic_test)
complete_data$Embarked[complete_data$Embarked==""] <- "S"
complete_data$Age[is.na(complete_data$Age)] <- median(complete_data$Age,na.rm=S)
complete_data <- as.data.frame(complete_data)
titanic_data <- select(complete_data,-c(Cabin, PassengerId, Ticket, Name))

titanic_data <- titanic_data[!titanic_data$Survived == "2", ]

y_vals <- titanic_data$Survived



x <- as.data.frame(titanic_data[,-1])



X <- model.matrix(y_vals~.,data = x)

# Specifying intial value for beta
beta <- as.numeric(rep(0.01, dim(X)[2]))

V。我面临的问题

我已将数据分为3部分-

X_1 <- X[1:297,]
X_2 <- X[298:594,]
X_3<- X[595:891,]

y_1 <- y_vals[1:297]
y_2 <- y_vals[298:594]
y_3 <- y_vals[595:891]

F_1 <- Fishers_matrix(beta,X_1,y_1)
F_2 <- Fishers_matrix(beta,X_2,y_2)
F_3 <- Fishers_matrix(beta,X_3,y_3)
F_4 <- score(beta,X_1,y_1)
F_5 <- score(beta,X_2,y_2)
F_6 <- score(beta,X_3,y_3)

有趣的是,每一方持有的Fisher信息的总和等于基于汇总信息的Fisher信息。但是,对于得分函数而言,情况并非如此。 鉴于得分函数(如上图2所示)可以计算为数据行的总和(在调整常量之后),因此我很难理解为什么会出现这种情况。 < / p>

# Sum of each party's Fishers Info and Score function
Fisher_sum <- F_1 + F_2 + F_3
Score_sum <- F_4 + F_5 + F_6

# Fisher info and Score function of aggregate information
Fish_full <- Fishers_matrix(beta,X,y_vals)
score_full <- score(beta,X,y_vals)

sum (Fisher_sum - Fish_full)

sum(Score_sum - score_full )

我将不胜感激-我相信这里的代码有错误(但是,如果我的统计信息有误,我很乐意予以纠正。)

0 个答案:

没有答案