如何在Stata中存储循环的回归结果?

时间:2014-08-19 06:15:22

标签: stata

我已经构建了一个基本上执行以下操作的模型:

run regressions on single time period
organise stocks into quantiles based on coefficient from linear regression
statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
store quantile 1 portolio and quantile 10 return for the last period 

这对变量只是时间范围内的最终条目。但是,我打算将单个时间段延长到大的时间范围,实质上是:

for i in timeperiod {
    organise stocks into quantiles based on coefficient from linear regression
    statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
    store quantile 1 portolio and quantile 10 return for the last period 
}

我之后的数据是每个时间段最后一天的投资​​组合1和10回报(使用前3年的数据构建)。这应该产生一个时间序列(我的总数据为60 - 3年,以建立第一个结果,所以57年)的回报,然后我可以相互回归。

regress portfolio 1 against portfolio 10

我来自R背景,在向量中存储变量非常简单,但我不太确定如何在Stata中进行此操作。

最后,我想要一个2xn矩阵(一个单独的数据集)的数字,每一对都是一次滚动回归的结果。对于非常模糊的描述感到抱歉,但它比解释我的模型的内容更好。任何指针(即使它是正确的手册条目)将非常感谢。谢谢。

编辑:我想要存储的实际数据只是一个变量。我通过添加回归让它变得混乱。我已将代码更改为更能代表我想要的内容。

3 个答案:

答案 0 :(得分:3)

听起来像是rollingstatsby的情况,具体取决于您想要做什么。这些是前缀命令,您可以在回归模型前添加前缀。 rollingstatsby会为您循环和存储结果。

如果您想获得最大程度的控制权,可以使用forvaluesforeach自行循环,并使用post将结果存储在单独的文件中。事实上,如果您查看rollingstatsby(使用viewsource),您会发现这就是这些命令在内部执行的操作。

答案 1 :(得分:2)

R不同,Stata只在内存中使用一个主要的矩形对象,称为(ta-da!)数据集。 (当然,它有很多其他的东西,但这些东西很少像使用use)带入内存的数据集一样容易解决。由于您的最终目标是运行回归,因此您需要创建其他数据集,或者将数据添加到现有数据集中。鉴于您的问题足够自定义,您似乎需要一个自定义解决方案。

解决方案1:使用post创建单独的数据集(请参阅help)。

use my_data, clear
postfile topost int(time_period) str40(portfolio) double(return_q1 return_q10) ///
     using my_derived_data, replace
* 1. topost is a placeholder name
* 2. I have no clue what you mean by "storing the portfolio", so you'd have to fill in
* 3. This will create the file my_derived_data.dta, 
*    which of course you can name as you wish
* 4. The triple slash is a continuation comment: the code is coninued on next line

levelsof time_period, local( allyears )
* 5. This will create a local macro allyears 
*    that contains all the values of time_period

foreach t of local allyears {
   regress outcome x1 x2 x3 if time_period == `t', robust
   * 6. the opening and closing single quotes are references to Stata local macros
   *    Here, I am referring to the cycle index t

   organise_stocks_into_quantiles_based_on_coefficient_from_linear_regression
   * this isn't making huge sense for me, so you'll have to put your code here
   * don't forget inserting if time_period == `t' as needed
   * something like this:
   predict yhat`t' if time_period == `t', xb
   xtile decile`t' = yhat`t' if time_period == `t', n(10)

   calculate_portfolio_returns_for_stocks_based_on_quantile
   forvalues q=1/10 {
        * do whatever if time_period == `t' & decile`t' == `q'
   }

   * store quantile 1 portolio and quantile 10 return for the last period 
   * again I am not sure what you mean and how to do that exactly
   * so I'll pretend it is something like
   ratio change / price if time_period == `t' , over( decile`t' )
   post topost (`t') ("whatever text describes the time `t' portfolio") /// 
       (_b[_ratio_1:1]) (_b[_ratio_1:10])
   * the last two sets of parentheses may contain whatever numeric answer you are producing
}

postclose topost
* 7. close the file you are creating

use my_derived_data, clear
tsset time_period, year
newey return_q10 return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect

exit
* 9. you always end your do-files with exit

解决方案2:保留当前数据集。如果上面的代码看起来很笨拙,你可以创建一个奇怪的数据集半人马,同时包含原始股票和摘要。

use my_data, clear

gen int collapsed_time = .
gen double collapsed_return_q1 = .
gen double collapsed_return_q10 = .
* 1. set up placeholders for your results

levelsof time_period, local( allyears )
* 2. This will create a local macro allyears 
*    that contains all the values of time_period

local T : word count `allyears'
* 3. I now use the local macro allyears as is
*    and count how many distinct values there are of time_period variable

forvalues n=1/`T' {
   * 4. my cycle now only runs for the numbers from 1 to `T'

   local t : word `n' of `allyears'
   * 5. I pull the `n'-th value of time_period

   ** computations as in the previous solution

   replace collapsed_time_period = `t' in `n'
   replace collapsed_return_q1 = (compute) in `n'
   replace collapsed_return_q10 = (compute) in `n'
   * 6. I am filling the pre-arranged variables with the relevant values
}

tsset collapsed_time_period, year
* 7. this will likely complain about missing values, so you may have to fix it
newey collapsed_return_q10 collapsed_return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect

exit
* 9. you always end your do-files with exit

我避免使用statsby,因为它会覆盖内存中的数据集。请记住,与R不同,Stata一次只能记住一个数据集,所以我的首选是避免过多的I / O操作,因为如果你有一个数据集,它们可能是整个事情中最慢的部分。 50+ Mbytes。

答案 2 :(得分:0)

我认为您正在寻找estout命令来存储回归结果。