我使用Stata估算ROA的滚动标准偏差(使用前一年的4个窗口)。现在,我想只保留那些在ROA中至少有3次观察(满分4次)的滚动标准偏差。我怎么能用Stata做到这一点?
ROA roa_sd
。 。
。 。
。 。
.0108869。
.0033411。
.0032814 .0053356(此值应该丢失,因为它仅使用2个有效值计算)
.0030827 .0043739
.0029793 .0038275
答案 0 :(得分:2)
您的问题已通过评论中的blog post我链接到上方。您可以使用rolling
,然后在观察次数未达到阈值时添加额外的屏幕以丢弃西格玛。
但是对于像sigma和beta这样的简单计算(即标准偏差和单变量回归系数),您可以通过更加手动的方法做得更好。将rolling
解决方案与我的手动解决方案进行比较。
/* generate panel by adpating the linked code */
clear
set obs 20000
gen date = _n
gen id = floor((_n - 1) / 20) + 1
gen roa = int((100) * runiform())
replace roa = . in 1/4
replace roa = . in 10/12
replace roa = . in 18/20
/* solution with rolling */
/* http://statadaily.wordpress.com/2014/03/31/rolling-standard-deviations-and-missing-observations/ */
timer on 1
xtset id date
rolling sd2 = r(sd), window(4) keep(date) saving(f2, replace): sum roa
merge 1:1 date using f2, nogenerate keepusing(sd2)
xtset id date
gen tag = missing(l3.roa) + missing(l2.roa) + missing(l1.roa) + missing(roa) > 1
gen sd = sd2 if (tag == 0)
timer off 1
/* my solution */
timer on 2
rolling_sd roa, window(4) minimum(3)
timer off 2
/* compare */
timer list
list in 1/50
我表示手动解决方案要快得多。
. /* compare */
. timer list
1: 132.38 / 1 = 132.3830
2: 0.10 / 1 = 0.0990
将以下内容保存为个人ado文件目录(或当前工作目录)中的rolling_sd.ado
。我确信有人可以进一步简化此代码。请注意,此代码还具有满足窗口前沿最低数据要求的额外优势(即,使用前三个观察值计算sigma,而不是等待所有四个观察值。)
*! 0.2 Richard Herron 3/30/14
* added minimum data requirement
*! 0.1 Richard Herron 1/12/12
program rolling_sd
version 11.2
syntax varlist(numeric), window(int) minimum(int)
* get dependent and indpendent vars from varlist
tempvar n miss xs x2s nonmiss1 nonmiss2 sigma1 sigma2
local w = `window'
local m = `minimum'
* generate cumulative sums and missing values
xtset
bysort `r(panelvar)' (`timevar'): generate `n' = _n
by `r(panelvar)': generate `miss' = sum(missing(`varlist'))
by `r(panelvar)': generate `xs' = sum(`varlist')
by `r(panelvar)': generate `x2s' = sum(`varlist' * `varlist')
* generate variance 1 (front of window)
generate `nonmiss1' = `n' - `miss'
generate `sigma1' = sqrt((`x2s' - `xs'*`xs'/`nonmiss1')/(`nonmiss1' - 1)) if inrange(`nonmiss1', `m', `w') & !missing(`nonmiss1')
* generate variance 2 (back of window, main part)
generate `nonmiss2' = `w' - s`w'.`miss'
generate `sigma2' = sqrt((s`w'.`x2s' - s`w'.`xs'*s`w'.`xs'/`nonmiss2')/(`nonmiss2' - 1)) if inrange(`nonmiss2', `m', `w') & !missing(`nonmiss2')
* return standard deviation
egen sigma = rowfirst(`sigma2' `sigma1')
end