Hello Stack社区,
我正在尝试使用线性模型来预测未来整个美国领土的工资增长。我想尝试为每个州/地区(DC,VI和PR)创建一个模型,但是,当我查看模型的系数时,每个州的系数都相同。
到目前为止,我已经使用plyr,dplyr和broom的组合来为该项目创建和排序我的数据框(名为stuben_dat)
#Wage Growth
state_data = stuben_dat %>% group_by(st) %>%
do (state_wg= lm(wage_growth ~ us_wage_growth + lag_wage_growth + dum1
+dum2 +dum3,
data= stuben_dat, subset=yr>= (current_year - 5)))
#The dummy variables adjust for seasonality (q1 vs q2 vs q3 vs q4)
#The current_year = whatever year I last updated the program
#The current_year-5 value lets me change the look back period
#This look back period can be used to exclude recessions or outliers
这只是我的输出的快照,如您所见,此处显示的每种状态的beta系数和回归统计完全相同(仅AK和AL)。但是,我想为每个状态建立一个不同的模型。
# A tibble: 318 x 6
# Groups: st [53]
st term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 AK (Intercept) -1.75 0.294 -5.97 3.28e- 9
2 AK us_wage_growth 996. 23.6 42.2 1.82e-228
3 AK lag_wage_growth 0.191 0.0205 9.34 5.58e- 20
4 AK dum1 -0.245 0.304 -0.806 4.21e- 1
5 AK dum2 -0.321 0.304 -1.06 2.90e- 1
6 AK dum3 0.0947 0.303 0.312 7.55e- 1
7 AL (Intercept) -1.75 0.294 -5.97 3.28e- 9
8 AL us_wage_growth 996. 23.6 42.2 1.82e-228
9 AL lag_wage_growth 0.191 0.0205 9.34 5.58e- 20
10 AL dum1 -0.245 0.304 -0.806 4.21e- 1
# ... with 308 more rows
答案 0 :(得分:1)
这是因为您在do()
呼叫中使用了相同的数据。试试:
state_data = stuben_dat %>%
group_by(st) %>%
do(state_wg = lm(wage_growth ~ us_wage_growth + lag_wage_growth +
dum1 + dum2 + dum3,
data = ., subset = (yr >= (current_year - 5))))