我已经构建了一个基本上执行以下操作的模型:
run regressions on single time period
organise stocks into quantiles based on coefficient from linear regression
statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
store quantile 1 portolio and quantile 10 return for the last period
这对变量只是时间范围内的最终条目。但是,我打算将单个时间段延长到大的时间范围,实质上是:
for i in timeperiod {
organise stocks into quantiles based on coefficient from linear regression
statsby to calculate portfolio returns for stocks based on quantile (averaging all quantile x returns)
store quantile 1 portolio and quantile 10 return for the last period
}
我之后的数据是每个时间段最后一天的投资组合1和10回报(使用前3年的数据构建)。这应该产生一个时间序列(我的总数据为60 - 3年,以建立第一个结果,所以57年)的回报,然后我可以相互回归。
regress portfolio 1 against portfolio 10
我来自R背景,在向量中存储变量非常简单,但我不太确定如何在Stata中进行此操作。
最后,我想要一个2xn矩阵(一个单独的数据集)的数字,每一对都是一次滚动回归的结果。对于非常模糊的描述感到抱歉,但它比解释我的模型的内容更好。任何指针(即使它是正确的手册条目)将非常感谢。谢谢。
编辑:我想要存储的实际数据只是一个变量。我通过添加回归让它变得混乱。我已将代码更改为更能代表我想要的内容。
答案 0 :(得分:3)
听起来像是rolling或statsby的情况,具体取决于您想要做什么。这些是前缀命令,您可以在回归模型前添加前缀。 rolling
或statsby
会为您循环和存储结果。
如果您想获得最大程度的控制权,可以使用forvalues或foreach自行循环,并使用post将结果存储在单独的文件中。事实上,如果您查看rolling
和statsby
(使用viewsource),您会发现这就是这些命令在内部执行的操作。
答案 1 :(得分:2)
与R
不同,Stata只在内存中使用一个主要的矩形对象,称为(ta-da!)数据集。 (当然,它有很多其他的东西,但这些东西很少像使用use
)带入内存的数据集一样容易解决。由于您的最终目标是运行回归,因此您需要创建其他数据集,或者将数据添加到现有数据集中。鉴于您的问题足够自定义,您似乎需要一个自定义解决方案。
解决方案1:使用post
创建单独的数据集(请参阅help)。
use my_data, clear
postfile topost int(time_period) str40(portfolio) double(return_q1 return_q10) ///
using my_derived_data, replace
* 1. topost is a placeholder name
* 2. I have no clue what you mean by "storing the portfolio", so you'd have to fill in
* 3. This will create the file my_derived_data.dta,
* which of course you can name as you wish
* 4. The triple slash is a continuation comment: the code is coninued on next line
levelsof time_period, local( allyears )
* 5. This will create a local macro allyears
* that contains all the values of time_period
foreach t of local allyears {
regress outcome x1 x2 x3 if time_period == `t', robust
* 6. the opening and closing single quotes are references to Stata local macros
* Here, I am referring to the cycle index t
organise_stocks_into_quantiles_based_on_coefficient_from_linear_regression
* this isn't making huge sense for me, so you'll have to put your code here
* don't forget inserting if time_period == `t' as needed
* something like this:
predict yhat`t' if time_period == `t', xb
xtile decile`t' = yhat`t' if time_period == `t', n(10)
calculate_portfolio_returns_for_stocks_based_on_quantile
forvalues q=1/10 {
* do whatever if time_period == `t' & decile`t' == `q'
}
* store quantile 1 portolio and quantile 10 return for the last period
* again I am not sure what you mean and how to do that exactly
* so I'll pretend it is something like
ratio change / price if time_period == `t' , over( decile`t' )
post topost (`t') ("whatever text describes the time `t' portfolio") ///
(_b[_ratio_1:1]) (_b[_ratio_1:10])
* the last two sets of parentheses may contain whatever numeric answer you are producing
}
postclose topost
* 7. close the file you are creating
use my_derived_data, clear
tsset time_period, year
newey return_q10 return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect
exit
* 9. you always end your do-files with exit
解决方案2:保留当前数据集。如果上面的代码看起来很笨拙,你可以创建一个奇怪的数据集半人马,同时包含原始股票和摘要。
use my_data, clear
gen int collapsed_time = .
gen double collapsed_return_q1 = .
gen double collapsed_return_q10 = .
* 1. set up placeholders for your results
levelsof time_period, local( allyears )
* 2. This will create a local macro allyears
* that contains all the values of time_period
local T : word count `allyears'
* 3. I now use the local macro allyears as is
* and count how many distinct values there are of time_period variable
forvalues n=1/`T' {
* 4. my cycle now only runs for the numbers from 1 to `T'
local t : word `n' of `allyears'
* 5. I pull the `n'-th value of time_period
** computations as in the previous solution
replace collapsed_time_period = `t' in `n'
replace collapsed_return_q1 = (compute) in `n'
replace collapsed_return_q10 = (compute) in `n'
* 6. I am filling the pre-arranged variables with the relevant values
}
tsset collapsed_time_period, year
* 7. this will likely complain about missing values, so you may have to fix it
newey collapsed_return_q10 collapsed_return_q1, lag(3)
* 8. just in case the business cycles have about 3 years of effect
exit
* 9. you always end your do-files with exit
我避免使用statsby
,因为它会覆盖内存中的数据集。请记住,与R
不同,Stata一次只能记住一个数据集,所以我的首选是避免过多的I / O操作,因为如果你有一个数据集,它们可能是整个事情中最慢的部分。 50+ Mbytes。
答案 2 :(得分:0)
我认为您正在寻找estout
命令来存储回归结果。