Question

我已经构建了一个基本上执行以下操作的模型：

run regressions on single time period
organise stocks into quantiles based on coefficient from linear regression
statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
store quantile 1 portolio and quantile 10 return for the last period

这对变量只是时间范围内的最终条目。但是，我打算将单个时间段延长到大的时间范围，实质上是：

for i in timeperiod {
    organise stocks into quantiles based on coefficient from linear regression
    statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
    store quantile 1 portolio and quantile 10 return for the last period 
}

我之后的数据是每个时间段最后一天的投资组合1和10回报（使用前3年的数据构建）。这应该产生一个时间序列（我的总数据为60 - 3年，以建立第一个结果，所以57年）的回报，然后我可以相互回归。

regress portfolio 1 against portfolio 10

我来自R背景，在向量中存储变量非常简单，但我不太确定如何在Stata中进行此操作。

最后，我想要一个2xn矩阵（一个单独的数据集）的数字，每一对都是一次滚动回归的结果。对于非常模糊的描述感到抱歉，但它比解释我的模型的内容更好。任何指针（即使它是正确的手册条目）将非常感谢。谢谢。

编辑：我想要存储的实际数据只是一个变量。我通过添加回归让它变得混乱。我已将代码更改为更能代表我想要的内容。

Answer 1

听起来像是rolling或statsby的情况，具体取决于您想要做什么。这些是前缀命令，您可以在回归模型前添加前缀。 rolling或statsby会为您循环和存储结果。

如果您想获得最大程度的控制权，可以使用forvalues或foreach自行循环，并使用post将结果存储在单独的文件中。事实上，如果您查看rolling和statsby（使用viewsource），您会发现这就是这些命令在内部执行的操作。

Answer 2

与R不同，Stata只在内存中使用一个主要的矩形对象，称为（ta-da！）数据集。（当然，它有很多其他的东西，但这些东西很少像使用use）带入内存的数据集一样容易解决。由于您的最终目标是运行回归，因此您需要创建其他数据集，或者将数据添加到现有数据集中。鉴于您的问题足够自定义，您似乎需要一个自定义解决方案。

解决方案1：使用post创建单独的数据集（请参阅help）。

use my_data, clear
postfile topost int(time_period) str40(portfolio) double(return_q1 return_q10) ///
     using my_derived_data, replace
* 1. topost is a placeholder name
* 2. I have no clue what you mean by "storing the portfolio", so you'd have to fill in
* 3. This will create the file my_derived_data.dta, 
*    which of course you can name as you wish
* 4. The triple slash is a continuation comment: the code is coninued on next line

levelsof time_period, local( allyears )
* 5. This will create a local macro allyears 
*    that contains all the values of time_period

foreach t of local allyears {
   regress outcome x1 x2 x3 if time_period == `t', robust
   * 6. the opening and closing single quotes are references to Stata local macros
   *    Here, I am referring to the cycle index t

   organise_stocks_into_quantiles_based_on_coefficient_from_linear_regression
   * this isn't making huge sense for me, so you'll have to put your code here
   * don't forget inserting if time_period == `t' as needed
   * something like this:
   predict yhat`t' if time_period == `t', xb
   xtile decile`t' = yhat`t' if time_period == `t', n(10)

   calculate_portfolio_returns_for_stocks_based_on_quantile
   forvalues q=1/10 {
        * do whatever if time_period == `t' & decile`t' == `q'
   }

   * store quantile 1 portolio and quantile 10 return for the last period 
   * again I am not sure what you mean and how to do that exactly
   * so I'll pretend it is something like
   ratio change / price if time_period == `t' , over( decile`t' )
   post topost (`t') ("whatever text describes the time `t' portfolio") /// 
       (_b[_ratio_1:1]) (_b[_ratio_1:10])
   * the last two sets of parentheses may contain whatever numeric answer you are producing
}

postclose topost
* 7. close the file you are creating

use my_derived_data, clear
tsset time_period, year
newey return_q10 return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect

exit
* 9. you always end your do-files with exit

解决方案2：保留当前数据集。如果上面的代码看起来很笨拙，你可以创建一个奇怪的数据集半人马，同时包含原始股票和摘要。

use my_data, clear

gen int collapsed_time = .
gen double collapsed_return_q1 = .
gen double collapsed_return_q10 = .
* 1. set up placeholders for your results

levelsof time_period, local( allyears )
* 2. This will create a local macro allyears 
*    that contains all the values of time_period

local T : word count `allyears'
* 3. I now use the local macro allyears as is
*    and count how many distinct values there are of time_period variable

forvalues n=1/`T' {
   * 4. my cycle now only runs for the numbers from 1 to `T'

   local t : word `n' of `allyears'
   * 5. I pull the `n'-th value of time_period

   ** computations as in the previous solution

   replace collapsed_time_period = `t' in `n'
   replace collapsed_return_q1 = (compute) in `n'
   replace collapsed_return_q10 = (compute) in `n'
   * 6. I am filling the pre-arranged variables with the relevant values
}

tsset collapsed_time_period, year
* 7. this will likely complain about missing values, so you may have to fix it
newey collapsed_return_q10 collapsed_return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect

exit
* 9. you always end your do-files with exit

我避免使用statsby，因为它会覆盖内存中的数据集。请记住，与R不同，Stata一次只能记住一个数据集，所以我的首选是避免过多的I / O操作，因为如果你有一个数据集，它们可能是整个事情中最慢的部分。 50+ Mbytes。

Answer 3

我认为您正在寻找estout命令来存储回归结果。

如何在Stata中存储循环的回归结果？

3 个答案: