我有一个不平衡的每日面板,其中条目在不平时发生。我想在过去365天内生成一些变量x
的滚动总和。我可以想到两种方法,但第一种方法是内存耗尽,第二种方法是处理器饥饿。有没有第三种方法可以避免这些问题?
以下是我的两个解决方案。是否存在没有内存或速度问题的第三种解决方案?
clear
set obs 200
set seed 2001
/* panel variables */
generate id = 1 + int(2*runiform())
generate time = mdy(1, 1, 2000) + int(10*365*runiform())
format time %td
duplicates drop
xtset id time
/* data */
generate x = runiform()
/* first approach is to fill the panel with `tsfill` */
/* then remove "seasonality" with `s.` */
tsfill
generate sx = sum(x)
generate ssx = s365.sx
/* second approach without `tsfill` */
/* but nested loop is fairly slow */
drop if missing(x)
generate double ssx_alt = 0
forvalues i = 1/`= _N' {
local j = `i'
local delta = time[`i'] - time[`j']
while ((`j' > 0) & (`delta' < 365) & (id[`i'] == id[`j'])) {
local x = cond(missing(x[`j']), 0, x[`j'])
replace ssx_alt = ssx_alt + `x' in `i'
local j = `j' - 1
local delta = time[`i'] - time[`j']
}
}
答案 0 :(得分:2)
过去#天的总和是两个累计金额之间的差额,即到现在的累计金额和#天前的累计金额。面板数据的扩展很简单,但这里没有显示。一旦你申请tsfill
,我认为差距不会影响这一原则。
. set obs 20
obs was 0, now 20
. gen t = _n
. gen y = 100 + _n
. gen sumy = sum(y)
. tsset t
time variable: t, 1 to 20
delta: 1 unit
. gen diff = sumy - L10.sumy
(10 missing values generated)
. l
+------------------------+
| t y sumy diff |
|------------------------|
1. | 1 101 101 . |
2. | 2 102 203 . |
3. | 3 103 306 . |
4. | 4 104 410 . |
5. | 5 105 515 . |
|------------------------|
6. | 6 106 621 . |
7. | 7 107 728 . |
8. | 8 108 836 . |
9. | 9 109 945 . |
10. | 10 110 1055 . |
|------------------------|
11. | 11 111 1166 1065 |
12. | 12 112 1278 1075 |
13. | 13 113 1391 1085 |
14. | 14 114 1505 1095 |
15. | 15 115 1620 1105 |
|------------------------|
16. | 16 116 1736 1115 |
17. | 17 117 1853 1125 |
18. | 18 118 1971 1135 |
19. | 19 119 2090 1145 |
20. | 20 120 2210 1155 |
+------------------------+