我正在尝试在R中使用Mlogit
,我对logits有点新,而且我在Mlogit框架中设置问题时遇到了问题。我实际上并非完全确定 mlogit是正确的方法。这是一个类似的问题。
考虑一个棒球数据集,结果变量带有“out”“single”“double”“triple”和“homerun”。对于解释变量,我们有击球手的名字,投手的名字和体育场。每个击球手有数百个观察点,包括许多击球手面对同一个击球手。
我认为这绝对是一个多项logit,因为我有多个分类结果,但我不确定,因为所有文档似乎都在处理替代品之间的“选择”,这不是真的。我试图通过为击球手设置一个因子变量来启动我的logit模型,另一个用于投手,另一个用于体育场。当我在R中尝试这个时,我得到了
Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length
通过一些谷歌搜索,我想也许只期望一个观察每个击球手,投手和公园的组合?也许不吧?我究竟做错了什么?我该如何设置?
编辑: 这里的数据示例
https://docs.google.com/spreadsheets/d/19fiq_QEMj4nAPcTqIRxeaYNPgqeHxKAEuPrfHMeIJ7o/edit?usp=sharing
答案 0 :(得分:1)
以下是有关如何开始分析数据的一些建议。
# Your dataset
dts <- structure(list(outcome = c(1L, 1L, 2L, 3L, 1L, 3L, 2L, 3L, 3L,
3L, 3L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 2L, 3L, 3L, 3L,
2L, 1L, 1L, 1L, 2L, 3L, 2L, 1L), hitter = structure(c(3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("james",
"jill", "john"), class = "factor"), pitcher = structure(c(3L,
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 1L, 1L,
2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 1L, 2L, 3L, 2L,
3L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 3L, 1L, 2L, 2L, 1L, 1L, 2L, 2L
), .Label = c("bill", "bob", "brett"), class = "factor"), place = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L,
5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
), .Label = c("ca", "co", "dc", "ny", "tn"), class = "factor")), .Names = c("outcome",
"hitter", "pitcher", "place"), class = "data.frame", row.names = c(NA,
-49L))
# Estimation of a multinomial logistic regression model
library(mlogit)
dts.wide <- mlogit.data(dts, choice="outcome", shape="wide")
fit.mlogit <- mlogit(outcome ~ 1 | hitter+pitcher+place, data=dts.wide)
# Results
library(stargazer)
stargazer(fit.mlogit, type="text")
# Model coefficients with standard errors and statistical significance (stars)
==========================================
Dependent variable:
---------------------------
outcome
------------------------------------------
2:(intercept) 19.456
(3,056.626)
3:(intercept) 35.179
(4,172.540)
2:hitterjill -17.543
(3,056.625)
3:hitterjill -33.117
(4,172.540)
2:hitterjohn -0.188
(0.996)
3:hitterjohn -1.410
(1.056)
2:pitcherbob -0.070
(1.005)
3:pitcherbob -1.270
(1.091)
2:pitcherbrett -0.908
(1.063)
3:pitcherbrett -2.284*
(1.257)
2:placeco -1.655
(1.557)
3:placeco -17.688
(2,840.270)
2:placedc -19.428
(3,056.626)
3:placedc -34.479
(4,172.540)
2:placeny -18.802
(3,056.625)
3:placeny -32.873
(4,172.540)
2:placetn -18.885
(3,056.626)
3:placetn -32.140
(4,172.540)
------------------------------------------
Observations 49
R2 0.155
Log Likelihood -44.605
LR Test 16.388 (df = 18)
==========================================
Note: *p<0.1; **p<0.05; ***p<0.01
有关R中多项Logistic模型估计的更多详细信息,请here。