我想在R中运行两阶段的probit最小二乘回归。有谁知道怎么做?那里有包裹吗?我知道使用Stata可以做到这一点,所以我想可以用R做它。
答案 0 :(得分:8)
当你说'两阶段 - 概率最小二乘'时,你可能想要更具体一些。由于您参考实现此目的的Stata程序,我猜您正在讨论CDSIMEQ程序包,它实现了Heckit模型的Amemiya(1978)程序(a.k.a Generalized Tobit,a.k.a.Tobit II型模型等)。正如格兰特所说,systemfit将为你做一个托比特,但不是两个方程式。 MicEcon软件包确实有一个Heckit(但是这个软件包分裂很多次,我不知道它现在在哪里)。
如果你想要CDSIMEQ的功能,可以很容易地在R中实现。我写了一个复制CDSIMEQ的函数:
tspls <- function(formula1, formula2, data) {
# The Continous model
mf1 <- model.frame(formula1, data)
y1 <- model.response(mf1)
x1 <- model.matrix(attr(mf1, "terms"), mf1)
# The dicontionous model
mf2 <- model.frame(formula2, data)
y2 <- model.response(mf2)
x2 <- model.matrix(attr(mf2, "terms"), mf2)
# The matrix of all the exogenous variables
X <- cbind(x1, x2)
X <- X[, unique(colnames(X))]
J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1))
J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2))
for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1
for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1
# Step 1:
cat("\n\tNOW THE FIRST STAGE REGRESSION")
m1 <- lm(y1 ~ X - 1)
m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit"))
print(summary(m1))
print(summary(m2))
yhat1 <- m1$fitted.values
yhat2 <- X %*% coef(m2)
PI1 <- m1$coefficients
PI2 <- m2$coefficients
V0 <- vcov(m2)
sigma1sq <- sum(m1$residuals ^ 2) / m1$df.residual
sigma12 <- 1 / length(y2) * sum(y2 * m1$residuals / dnorm(yhat2))
# Step 2:
cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS")
m1 <- lm(y1 ~ yhat2 + x1 - 1)
m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
sm1 <- summary(m1)
sm2 <- summary(m2)
print(sm1)
print(sm2)
# Step 3:
cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n")
gamma1 <- m1$coefficients[1]
gamma2 <- m2$coefficients[1]
cc <- sigma1sq - 2 * gamma1 * sigma12
dd <- gamma2 ^ 2 * sigma1sq - 2 * gamma2 * sigma12
H <- cbind(PI2, J1)
G <- cbind(PI1, J2)
XX <- crossprod(X) # X'X
HXXH <- solve(t(H) %*% XX %*% H) # (H'X'XH)^(-1)
HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH
Valpha1 <- cc * HXXH + gamma1 ^ 2 * HXXH %*% HXXVXXH %*% HXXH
GV <- t(G) %*% solve(V0) # G'V0^(-1)
GVG <- solve(GV %*% G) # (G'V0^(-1)G)^(-1)
Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG
ans1 <- coef(sm1)
ans2 <- coef(sm2)
ans1[,2] <- sqrt(diag(Valpha1))
ans2[,2] <- sqrt(diag(Valpha2))
ans1[,3] <- ans1[,1] / ans1[,2]
ans2[,3] <- ans2[,1] / ans2[,2]
ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE)
ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE)
cat("Continuous:\n")
print(ans1)
cat("Dichotomous:\n")
print(ans2)
}
为了进行比较,我们可以在article about the package中复制CDSIMEQ作者的样本。
> library(foreign)
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta")
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4,
+ dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7,
+ data = cdsimeq)
NOW THE FIRST STAGE REGRESSION
Call:
lm(formula = y1 ~ X - 1)
Residuals:
Min 1Q Median 3Q Max
-1.885921 -0.438579 -0.006262 0.432156 2.133738
Coefficients:
Estimate Std. Error t value Pr(>|t|)
X(Intercept) 0.010752 0.020620 0.521 0.602187
Xexog3 0.158469 0.021862 7.249 8.46e-13 ***
Xexog2 -0.009669 0.021666 -0.446 0.655488
Xexog1 0.159955 0.021260 7.524 1.19e-13 ***
Xexog4 0.316575 0.022456 14.097 < 2e-16 ***
Xexog5 0.497207 0.021356 23.282 < 2e-16 ***
Xexog6 -0.078017 0.021755 -3.586 0.000352 ***
Xexog7 0.161177 0.022103 7.292 6.23e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6488 on 992 degrees of freedom
Multiple R-squared: 0.5972, Adjusted R-squared: 0.594
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16
Call:
glm(formula = y2 ~ X - 1, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49531 -0.59244 0.01983 0.59708 2.41810
Coefficients:
Estimate Std. Error z value Pr(>|z|)
X(Intercept) 0.08352 0.05280 1.582 0.113692
Xexog3 0.21345 0.05678 3.759 0.000170 ***
Xexog2 0.21131 0.05471 3.862 0.000112 ***
Xexog1 0.45591 0.06023 7.570 3.75e-14 ***
Xexog4 0.39031 0.06173 6.322 2.57e-10 ***
Xexog5 0.75955 0.06427 11.818 < 2e-16 ***
Xexog6 0.85461 0.06831 12.510 < 2e-16 ***
Xexog7 -0.16691 0.05653 -2.953 0.003152 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.14 on 992 degrees of freedom
AIC: 770.14
Number of Fisher Scoring iterations: 6
NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS
Call:
lm(formula = y1 ~ yhat2 + x1 - 1)
Residuals:
Min 1Q Median 3Q Max
-2.32152 -0.53160 0.04886 0.53502 2.44818
Coefficients:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.257592 0.021451 12.009 <2e-16 ***
x1(Intercept) 0.012185 0.024809 0.491 0.623
x1exog3 0.042520 0.026735 1.590 0.112
x1exog2 0.011854 0.026723 0.444 0.657
x1exog1 0.007773 0.028217 0.275 0.783
x1exog4 0.318636 0.028311 11.255 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7803 on 994 degrees of freedom
Multiple R-squared: 0.4163, Adjusted R-squared: 0.4128
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16
Call:
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.49610 -0.58595 0.01969 0.59857 2.41281
Coefficients:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26287 0.16061 7.863 3.75e-15 ***
x2(Intercept) 0.07080 0.05276 1.342 0.179654
x2exog1 0.25093 0.06466 3.880 0.000104 ***
x2exog2 0.22604 0.05389 4.194 2.74e-05 ***
x2exog5 0.12912 0.09510 1.358 0.174544
x2exog6 0.95609 0.07172 13.331 < 2e-16 ***
x2exog7 -0.37128 0.06759 -5.493 3.94e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1386.29 on 1000 degrees of freedom
Residual deviance: 754.21 on 993 degrees of freedom
AIC: 768.21
Number of Fisher Scoring iterations: 6
NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS
Continuous:
Estimate Std. Error t value Pr(>|t|)
yhat2 0.25759209 0.1043073 2.46955009 0.01369540
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445
x1exog3 0.04252006 0.1291588 0.32920764 0.74206810
x1exog2 0.01185438 0.1290754 0.09184073 0.92684309
x1exog1 0.00777347 0.1363643 0.05700519 0.95455252
x1exog4 0.31863627 0.1367881 2.32941597 0.02003661
Dichotomous:
Estimate Std. Error z value Pr(>|z|)
yhat1 1.26286574 0.7395166 1.7076909 0.0876937093
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867
x2exog1 0.25092561 0.3126763 0.8025092 0.4222584495
x2exog2 0.22603717 0.2739307 0.8251618 0.4092797527
x2exog5 0.12911922 0.4822986 0.2677163 0.7889176766
x2exog6 0.95609385 0.2823662 3.3860070 0.0007091758
x2exog7 -0.37128221 0.3265478 -1.1369920 0.2555416141
答案 1 :(得分:2)
R中有几个可用于执行两个状态最小二乘的程序包。这里有几个
请告诉我这些是否符合您的目的。
答案 2 :(得分:2)
systemfit 也可以解决问题。