最近,我不得不与R和SPSS一起使用“多项式回归”框架来分析数据集。我们调查了一些参与者(10至12岁),询问他们最喜欢哪个“专业领域” ,然后询问他们访问互联网的频率。因此,结果是“分类变量”: 专业领域-“军事”,“我不知道”和“其他专业”;并且自变量也是分类变量(您访问互联网的频率(“我没有访问权限”,“ 1-3小时/天”,“ 3-5小时/天”)。
我使用R(带有 nnet包,通过多项式函数)运行了一个模型,其他统计学家则使用SPSS运行了。所有参考类别均已正确定义。
现在,当我们比较结果时,他们不同意我的自变量的第二类。第一个还可以。
请查看整个代码:
library(tidyverse)
library(stargazer)
library(nnet)
ds <- ds %>% mutate(internet = factor(internet))
ds <- ds %>% mutate(internet = relevel(internet, ref = "I dont have internet access"))
ds <- ds %>% mutate(field = factor(field))
ds <- ds %>% mutate(fielf = relevel(field, ref = "I dont know"))
mod <- multinom(field ~ internet, data = ds, maxit=1000, reltol = 1.0e-9)
stargazer(mod, type = 'text')
为清楚起见,当自变量只有两个类别(例如性别,男性和女性)时,R和SPSS均与其结果一致
在努力理解两个结果之间的差异之后,我读到nnet estimation could have some problems(优化问题?)and that the discrepancy of results is not so strange as I was thinking at the beginning。.
有人可以向我解释这里发生了什么吗?我想念什么?!我假设如果我们运行相同的模型,则SPSS和R必须具有相同的结果。
谢谢
这是我在此示例中使用的ds:
ds <- structure(list(sex = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 1L), .Label = c("male", "female"), class = "factor"), internet = structure(c(3L,
3L, 2L, 3L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 3L, 2L,
2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L,
2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 2L, 3L, 3L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 1L, 2L, 3L, 1L, 2L, 2L, 2L, 3L, 3L, 2L,
2L, 1L, 3L, 2L, 2L, 3L, 2L, 2L), .Label = c("I dont have internet access",
"1-3 hours/day", "3-5 hours/day"), class = "factor"), field = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("I dont know", "Military",
"Other profession"), class = "factor")), class = "data.frame", row.names = c(NA,
-73L))
答案 0 :(得分:0)
您可以替代使用mlogit
,它与SPSS结果更接近。 SPSS值应该有效,因为Stata会产生相似的结果(-14.88 (982.95), 11.58 (982.95), 11.44 (982.95)
)。其余的偏差可能源于“其他职业”的荒谬意义。
library(mlogit)
ml.dat <- mlogit.data(ds, choice="field", shape="wide")
ml <- mlogit(field ~ 1 | internet, data=ml.dat)
屈服
texreg::screenreg(ml)
=========================================================
Model 1
---------------------------------------------------------
Military:(intercept) -0.41
(0.91)
Other profession:(intercept) -16.89
(2690.89)
Military:factor(internet)1-3 hours/day -1.50
(1.06)
Other profession:factor(internet)1-3 hours/day 13.60
(2690.89)
Military:factor(internet)3-5 hours/day -1.64
(1.06)
Other profession:factor(internet)3-5 hours/day 13.46
(2690.89)
---------------------------------------------------------
AIC 85.49
Log Likelihood -36.74
Num. obs. 73
=========================================================
*** p < 0.001, ** p < 0.01, * p < 0.05