Question

我已经构建了一个二项式glm模型。该模型预测两个潜在类别之间的输出：AD或Control。这些变量是具有级别的因素：{AD，Control}。我使用这个模型来预测和获得每个样本的概率，但是我不清楚0.5以上的概率是指AD还是控制。

这是我的数据集：

> head(example)
          cleaned_mayo$Diagnosis pca_results$x[, 1]
1052_TCX                      AD          0.9613241
1104_TCX                      AD         -0.9327390
742_TCX                       AD          1.6908874
1945_TCX                 Control          0.6819104
134_TCX                       AD          0.5184748
11386_TCX                Control          0.4669661

这是我的代码来计算模型并进行预测：

# Randomize rows of top performer
example<- example[sample(nrow(example)),]

# Subset data for training and testing
N_train<- round(nrow(example)*0.75)
train<- example[1:N_train,]
test<- example[(N_train+1):nrow(example),]
colnames(train)[1:2]<- c("Diagnosis", "Eigen_gene")
colnames(test)[1:2]<- c("Diagnosis", "Eigen_gene")

# Build model and predict   
model_IFGyel<- glm(Diagnosis ~ Eigen_gene, data = train, family = binomial())
pred<- predict(model_IFGyel, newdata= test, type= "response")

# Convert predictions to accuracy metric
pred[which(pred<0.5)]<- "AD"
pred[which(pred!="AD")]<- "Control"
test$Diagnosis<- as.character(test$Diagnosis)
example_acc<- sum(test$Diagnosis==pred, na.rm = T)/nrow(test)

任何帮助澄清这些预测概率表明的内容都值得赞赏。

Answer 1

从?glm我们注意到：

详细说明：

典型的预测器具有'response~ terms'形式      'response'是（数字）响应向量，'terms'是a      一系列术语，指定“响应”的线性预测器。      对于'二项式'和'准二项式'家庭，反应也可以      被指定为'因子'（当第一级表示失败时      和所有其他成功）或作为列的两列矩阵      给出成功和失败的数量。

突出显示关键部分。假设您没有指定级别（即R发生默认分配），则AD将失败，Control将成功。因此，系数/模型将是观察在Control类中的概率。

如果您想更改此设置，请使用factor(...., levels = c('Control', 'AD'))或只执行1 - prob（控制）（1 - 预测值），以AD来表示。

GLM回归预测 - 了解哪个因素水平是成功的

1 个答案: