Question

我正在处理具有以下形式的法术数据集：

    clear all

input persid    start   end t_start t_end   spell_type  year    spell_number    event
    1   8   9   44  45  1   1999    1   0
    1   12  12  60  60  1   2000    1   0
    1   1   1   61  61  1   2001    1   0
    1   7   11  67  71  1   2001    2   0
    1   1   4   85  88  2   2003    1   0
    1   5   7   89  91  1   2003    2   1
    1   8   11  92  95  2   2003    3   0
    1   1   1   97  97  2   2004    1   0
    1   1   3   121 123 1   2006    1   1
    1   4   5   124 125 2   2006    2   0
    1   6   9   126 129 1   2006    3   1
    1   10  11  130 131 2   2006    4   0
    1   12  12  132 132 1   2006    5   1
    1   1   12  157 168 1   2009    1   0
    1   1   12  169 180 1   2010    1   0
    1   1   12  181 192 1   2011    1   0
    1   1   12  193 204 1   2012    1   0
    1   1   12  205 216 1   2013    1   0
end

lab define lab_spelltype 1 "unemployment spell" 2 "employment spell"
lab val spell_type lab_spelltype

其中persid是该人的身份证明; start和end分别是每年失业/就业咒语开始和结束的月份; t_start和t_end是相同的措施，但从1996年1月1日开始计算; event对于上一行为失业法的就业条目等于1。

数据是指在某一年内没有重叠法术，并且每年将相同类型的连续法术合并在一起。

我的目标是，对于event为1的每一行，计算过去6个月和24个月所用的月数。在这个具体的例子中，我想得到的是：

clear all
input persid    start   end t_start t_end   spell_type  year    spell_number    event   empl_6  empl_24
    1   8   9   44  45  1   1999    1   0   .   .
    1   12  12  60  60  1   2000    1   0   .   .
    1   1   1   61  61  1   2001    1   0   .   .
    1   7   11  67  71  1   2001    2   0   .   .
    1   1   4   85  88  2   2003    1   0   .   .
    1   5   7   89  91  1   2003    2   1   0   5
    1   8   11  92  95  2   2003    3   0   .   .
    1   1   1   97  97  2   2004    1   0   .   .
    1   1   3   121 123 1   2006    1   1   0   0
    1   4   5   124 125 2   2006    2   0   .   .
    1   6   9   126 129 1   2006    3   1   3   3
    1   10  11  130 131 2   2006    4   0   .   .
    1   12  12  132 132 1   2006    5   1   4   7
    1   1   12  157 168 1   2009    1   0   .   .
    1   1   12  169 180 1   2010    1   0   .   .
    1   1   12  181 192 1   2011    1   0   .   .
    1   1   12  193 204 1   2012    1   0   .   .
    1   1   12  205 216 1   2013    1   0   .   .
end

所以我的想法是，我必须回到每个event==1条目之前的行，并计算个人的工作月数。

您能否建议一种获得最终结果的方法？有人建议expand数据集，但也许有更好的方法来解决这个问题（特别是因为数据集非常大）。

修改

就业状况的正确标签是：

lab define lab_spelltype 1 "employment spell" 2 "unemployment spell"

使用此标签，过去工作月数（empl_6和empl_24）以及event的定义现已正确无误。

Answer 1

发布的示例在开发和测试解决方案方面没什么用处，所以我编写了具有相同属性的虚假数据。使用1和2作为指标的值是不好的做法，因此我将使用的指标替换为1含义，否则为0。单独使用月份和年份也没用，因此使用Stata月度日期。

第一个解决方案在每个法术扩展到每月一次观察后使用rangestat（来自SSC）。使用面板数据，您需要做的就是将就业指标与所需的时间窗口相加。

第二个解决方案使用* fake data for 100 persons, up to 10 spells with no overlap clear set seed 123423 set obs 100 gen long persid = _n gen spell_start = ym(runiformint(1990,2013),1) expand runiformint(1,10) bysort persid: gen spellid = _n by persid: gen employed = runiformint(0,1) by persid: gen spell_avg = int((ym(2015,12) - spell_start) / _N) + 1 by persid: replace spell_start = spell_start[_n-1] + /// runiformint(1,spell_avg) if _n > 1 by persid: gen spell_end = runiformint(spell_start, spell_start[_n+1]-1) replace spell_end = spell_start + runiformint(1,12) if mi(spell_end) format %tm spell_start spell_end * an event is an employment spell that immediately follow an unemployment spell by persid: gen event = employed & employed[_n-1] == 0 * expand to one obs per month and declare as panel data expand spell_end - spell_start + 1 bysort persid spellid: gen ym = spell_start + _n - 1 format %tm ym tsset persid ym * only count employement months; limit results to first month event obs tsegen m6 = rowtotal(L(1/6).employed) tsegen m24 = rowtotal(L(1/24).employed) bysort persid spellid (ym): replace m6 = . if _n > 1 | !event bysort persid spellid (ym): replace m24 = . if _n > 1 | !event * --------- redo using rangestat, without any monthly expansion ---------------- * return to original obs but keep first month results bysort persid spellid: keep if _n == 1 * employment end and duration for employed observations only gen e_end = spell_end if employed gen e_len = spell_end - spell_start + 1 if employed foreach target in 6 24 { // define interval bounds but only for event observations // an out-of-sample [0,0] interval will yield no results for non-events gen low`target' = cond(event, spell_start-`target', 0) gen high`target' = cond(event, spell_start-1, 0) // sum employment lengths and save earliest employment spell info rangestat (sum) empl`target'=e_len /// (firstnm) firste`target'=e_end firste`target'len=e_len, /// by(persid) interval(spell_end low`target' high`target') // remove from the count months that occur before lower bound gen e_start = firste`target' - firste`target'len + 1 gen outside = low`target' - e_start gen empl`target'final = cond(outside > 0, empl`target'-outside, empl`target') replace empl`target'final = 0 if mi(empl`target'final) & event drop e_start outside } * confirm that we match the -tsegen- results assert m24 == empl24final assert m6 == empl6final（也来自SSC）并执行相同的计算而根本不扩展数据。这个想法很简单，如果法术结束落在所需的窗口内，只需添加以前的就业法术的持续时间。当然，如果法术的结尾落入窗口而不是开始，则必须减去窗口外的天数。

#!/bin/bash
for (( i=1; i<=$#; i++ )); do
  if [[ ${!i:0:1} == "-" ]] && ! [[ ${!i:1} =~ [^a-zA-Z]+ ]]; then
    for (( j=1; j<=$(($(expr length ${!i})-1)); j++ )); do
      if [[ ${!i:j:1} == "s" ]]; then
        k=$((i+1))
        if [ -e ${!k} ]; then
          echo $(realpath ${!k})
        fi
      elif [[ ${!i:j:1} == "o" ]]; then
        echo "Running script without output!"
      fi
    done
  fi
done

Answer 2

问题的解决方案是：

扩展数据，以便每月进行一次，
用tsfill填写差距月份，最后填写
使用sum()和滞后运算符来获取过去6个月和24个月的运行总和。

另请参阅Robert解决方案，了解我借用的一些想法。

重要：这几乎肯定不是解决问题的有效方法，特别是如果数据很大（如我的情况）。然而，加号是实际上“看到”背景中发生的事情以确保最终结果是所需的结果。

此外，重要的是，此解决方案考虑了2（或更多）事件在6（或24）个月内相互发生的情况。

clear all

input persid    start   end t_start t_end   spell_type  year    spell_number    event
    1   8   9   44  45  1   1999    1   0
    1   12  12  60  60  1   2000    1   0
    1   1   1   61  61  1   2001    1   0
    1   7   11  67  71  1   2001    2   0
    1   1   4   85  88  2   2003    1   0
    1   5   7   89  91  1   2003    2   1
    1   8   11  92  95  2   2003    3   0
    1   1   1   97  97  2   2004    1   0
    1   1   3   121 123 1   2006    1   1
    1   4   5   124 125 2   2006    2   0
    1   6   9   126 129 1   2006    3   1
    1   10  11  130 131 2   2006    4   0
    1   12  12  132 132 1   2006    5   1
    1   1   12  157 168 1   2009    1   0
    1   1   12  169 180 1   2010    1   0
    1   1   12  181 192 1   2011    1   0
    1   1   12  193 204 1   2012    1   0
    1   1   12  205 216 1   2013    1   0
end

lab define lab_spelltype 1 "employment" 2 "unemployment"
lab val spell_type lab_spelltype
list

* generate Stata monthly dates
gen spell_start = ym(year,start)
gen spell_end = ym(year,end)
format %tm spell_start spell_end
list

* expand to monthly data
gen n = spell_end - spell_start + 1
expand n, gen(expanded)
sort persid year spell_number (expanded)
bysort persid year spell_number: gen month = spell_start + _n - 1
by persid year spell_number: replace event = 0 if _n > 1
format %tm month

* xtset, fill months gaps with "empty" rows, use lags and cumsum to count past months in employment
xtset persid month, monthly // %tm format
tsfill
bysort persid (month): gen cumsum = sum(spell_type) if spell_type==1
bysort persid (month): replace cumsum = cumsum[_n-1] if cumsum==.
bysort persid (month): gen m6  = cumsum-1 - L7.cumsum if event==1  // "-1" otherwise it sums also current empl month
bysort persid (month): gen m24 = cumsum-1 - L25.cumsum if event==1
drop if event==.
list persid start end year m* if event

法术数据管理：在过去24个月内在特定州内度过的月数

2 个答案: