Question

我有t1年到t2年的面板数据。一些人在t1之后进入样本，和/或在t2之前退出样本。为了提高效率（大样本），数据集仅包含观察到个体的年份的行。

我想为每个人添加一个新的观察值，其中包含一个人离开样品后的年。因此，如果有人离开，比如说2003年，我希望新的观察结果在year变量中包含个人的id和值2004。该观察中的所有其他变量都应该丢失。

这是我的方法，使用示例数据集：

webuse nlswork, clear

* Here goes plenty of lines of codes modifying the dataset ... for generality *

timer on 1

preserve
keep id year
bysort id (year) : keep if _n == _N
replace year = year + 1
save temp.dta, replace
restore

append using temp.dta
sort id year
erase temp.dta

timer off 1
timer list

我认为这可能效率不高，因为它包括保存/恢复，保存/删除其他数据库以及附加所有相对耗时的操作。像tsfill, last这样的东西会很棒，但是该选项不存在。有谁知道更有效的方法？上面的代码包含计时器，因此任何人都可以针对另一种方法进行基准测试。

Answer 1

当编码花费几分钟时，尝试节省秒数从未给我留下深刻的印象。这比您的方法更直接。

bysort id (year) : gen byte last = _n == _N 
expand 2 if last 
bysort id (year) : replace year = year + 1 if _n == _N

编辑：您需要遍历数据集中的其他变量，以用缺失值替换它们的值。为了简单起见，我将假定它们都是数字。

bysort id (year) : replace last = _n == _N 
ds id year, not 
quietly foreach v in `r(varlist)' { 
    replace `v' = . if last 
}

将观测值添加到Stata中的面板

1 个答案: