在R中设置Mlogit,每个类别都有许多观察结果

时间:2017-07-04 00:54:57

标签: r mlogit

我正在尝试在R中使用Mlogit,我对logits有点新,而且我在Mlogit框架中设置问题时遇到了问题。我实际上并非完全确定 mlogit是正确的方法。这是一个类似的问题。

考虑一个棒球数据集,结果变量带有“out”“single”“double”“triple”和“homerun”。对于解释变量,我们有击球手的名字,投手的名字和体育场。每个击球手有数百个观察点,包括许多击球手面对同一个击球手。

我认为这绝对是一个多项logit,因为我有多个分类结果,但我不确定,因为所有文档似乎都在处理替代品之间的“选择”,这不是真的。我试图通过为击球手设置一个因子变量来启动我的logit模型,另一个用于投手,另一个用于体育场。当我在R中尝试这个时,我得到了

Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length

通过一些谷歌搜索,我想也许只期望一个观察每个击球手,投手和公园的组合?也许不吧?我究竟做错了什么?我该如何设置?

编辑: 这里的数据示例

https://docs.google.com/spreadsheets/d/19fiq_QEMj4nAPcTqIRxeaYNPgqeHxKAEuPrfHMeIJ7o/edit?usp=sharing

1 个答案:

答案 0 :(得分:1)

以下是有关如何开始分析数据的一些建议。

# Your dataset
dts <- structure(list(outcome = c(1L, 1L, 2L, 3L, 1L, 3L, 2L, 3L, 3L, 
3L, 3L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 3L, 2L, 2L, 
2L, 2L, 1L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 2L, 3L, 2L, 3L, 3L, 3L, 
2L, 1L, 1L, 1L, 2L, 3L, 2L, 1L), hitter = structure(c(3L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("james", 
"jill", "john"), class = "factor"), pitcher = structure(c(3L, 
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 1L, 1L, 
2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 1L, 2L, 3L, 2L, 
3L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 3L, 1L, 2L, 2L, 1L, 1L, 2L, 2L
), .Label = c("bill", "bob", "brett"), class = "factor"), place = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 
5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L
), .Label = c("ca", "co", "dc", "ny", "tn"), class = "factor")), .Names = c("outcome", 
"hitter", "pitcher", "place"), class = "data.frame", row.names = c(NA, 
-49L))

# Estimation of a multinomial logistic regression model
library(mlogit)
dts.wide <- mlogit.data(dts, choice="outcome", shape="wide")
fit.mlogit <- mlogit(outcome ~ 1 | hitter+pitcher+place, data=dts.wide)

# Results
library(stargazer)
stargazer(fit.mlogit, type="text")

# Model coefficients with standard errors and statistical significance (stars)
==========================================
                   Dependent variable:    
               ---------------------------
                         outcome          
------------------------------------------
2:(intercept)            19.456           
                       (3,056.626)        

3:(intercept)            35.179           
                       (4,172.540)        

2:hitterjill             -17.543          
                       (3,056.625)        

3:hitterjill             -33.117          
                       (4,172.540)        

2:hitterjohn             -0.188           
                         (0.996)          

3:hitterjohn             -1.410           
                         (1.056)          

2:pitcherbob             -0.070           
                         (1.005)          

3:pitcherbob             -1.270           
                         (1.091)          

2:pitcherbrett           -0.908           
                         (1.063)          

3:pitcherbrett           -2.284*          
                         (1.257)          

2:placeco                -1.655           
                         (1.557)          

3:placeco                -17.688          
                       (2,840.270)        

2:placedc                -19.428          
                       (3,056.626)        

3:placedc                -34.479          
                       (4,172.540)        

2:placeny                -18.802          
                       (3,056.625)        

3:placeny                -32.873          
                       (4,172.540)        

2:placetn                -18.885          
                       (3,056.626)        

3:placetn                -32.140          
                       (4,172.540)        

------------------------------------------
Observations               49             
R2                        0.155           
Log Likelihood           -44.605          
LR Test             16.388 (df = 18)      
==========================================
Note:          *p<0.1; **p<0.05; ***p<0.01

有关R中多项Logistic模型估计的更多详细信息,请here