说我有这个MWE:
clear all
input str2 person enr_year enr_term
"a" 2000 1
"a" 2000 2
"a" 2000 2
"a" 2000 3
"a" 2000 3
"a" 2001 1
"a" 2001 2
"a" 2001 3
"a" 2002 2
"a" 2002 2
"a" 2003 2
"a" 2006 1
"a" 2006 2
"a" 2008 2
"b" 2000 2
"b" 2001 3
end
label define term 1 "Summer" 2 "Fall" 3 "Spring"
label values enr_term term
一定要进行一些解释。这是学校入学数据。 person
是一个人,一切都需要在一个人内完成。
enr_year
是一学年。 enr_term
是一个学术术语。夏季和秋季比春季提前的原因是,该年是学年,而不是日历年。
数据中的每一行暗含表示该人在给定的年份和学期中报名。
我的任务是创建两个指标变量:enr_this_spring
和enr_next_fall
。我可以成功获得enr_this_spring
。我已经包括了这样做的代码,以防逻辑有助于弄清楚如何获得enr_next_fall
。
*这些指标变量仅应为秋季入学的观测数据创建。
enr_this_spring
表示该人于次年春季入学。因为我们只为秋季学期设置此变量,所以如果同一年中有春季观测,则该值为1。否则将为0,即使明年有春季观测。
enr_next_fall
将为1,如果从下一年开始有下降趋势。如下所述,如果学生进入x的秋天,而不是x + 1的秋天,但x + n的秋天,则我不确定如何克服这个问题。
其中n> 1。
如果同一年内有两个秋季观测值(多个入学时间,也许该学生同时在两个学校就读),则它们都将具有相同的值。
这就是我想要得到的:
clear all
input str2 person enr_year enr_term enr_this_spring enr_next_fall
"a" 2000 1 . . // missing because not Fall
"a" 2000 2 1 1 // 1 b/c a/2000/3; 1 b/c a/2001/2
"a" 2000 2 1 1 // same reasons as line directly above
"a" 2000 3 . . // missing because not Fall
"a" 2000 3 . . // missing because not Fall
"a" 2001 1 . . // missing because not Fall
"a" 2001 2 1 1 // 1 b/c a/2001/3; 1 b/c a/2002/2
"a" 2001 3 . . // missing because not Fall
"a" 2002 2 0 1 // 0 b/c no a/2002/3; 1 b/c a/2003/2
"a" 2002 2 0 1 // same reasons as line directly above
"a" 2003 2 0 0 // 0 b/c no a/2003/2; 0 b/c no a/2004/2
"a" 2006 1 . . // missing because not Fall
"a" 2006 2 0 0 // 0 b/c no a/2006/3; 0 b/c no a/2007/2
"a" 2008 2 0 0 // 0 b/c no a/2008/3; 0 b/c no a/2009/2
"b" 2000 2 0 0 // 0 b/c no a/2000/3; 0 b/c no a/2001/2
"b" 2001 3 . . // missing because not Fall
end
label define term 1 "Summer" 2 "Fall" 3 "Spring"
label values enr_term term
从原始数据开始,我首先可以成功获取enr_this_spring,如下所示:
*Create indicators for if the term is spring and if term is fall
gen is_spring = enr_term == 3
gen is_fall = enr_term ==2
*Get the maximum value, within person and year
bys person enr_year: egen enr_this_spring = max(is_spring)
replace enr_this_spring=. if is_fall!=1
我不确定如何为该人明年秋天入学创建一个指标。
这是我尝试过的内容,并解释了为什么它不能按照以下代码工作:
*Preserve the data. We are going to process it and merge back on
preserve
*We only are concerned about fall attendance for this part
keep if enr_term==2
*We only want one observation per term, as duplicates mess up the code
bys person enr_year enr_term: keep if _n==1
*Make a variable that is a constant 1
gen one = 1
*Make a variable, enr_next_fall that is 1 if the person enrolled in the fall
* in the following observation. Note that we do this within group and sort
* by enr_year
bys person (enr_year): gen enr_next_fall = one[_n+1]
* Replace missing with 0. This only affects the final observation within group
replace enr_next_fall = 0 if missing(enr_next_fall)
*Create temporary file, to be merged on
tempfile a
save `a'
restore
*Merge on the temporary file
merge m:1 person enr_year enr_term using `a'
drop is_spring is_fall one _merge
在第二年秋天那个人没有入学但又回来的情况下,这并不能满足我的需求。也许他们生病了,错过了整个学年。我该如何解决?
答案 0 :(得分:0)
我想我已经知道了:
clear all
input str2 person enr_year enr_term
"a" 2000 1
"a" 2000 2
"a" 2000 2
"a" 2000 3
"a" 2000 3
"a" 2001 1
"a" 2001 2
"a" 2001 3
"a" 2002 2
"a" 2002 2
"a" 2003 2
"a" 2006 1
"a" 2006 2
"a" 2008 2
"b" 2000 2
"b" 2001 3
end
label define term 1 "Summer" 2 "Fall" 3 "Spring"
label values enr_term term
*Create indicators for if the term is spring and if term is fall
gen is_spring = enr_term == 3
gen is_fall = enr_term ==2
*Get the maximum value, within person and year
bys person enr_year: egen enr_this_spring = max(is_spring)
replace enr_this_spring=. if is_fall!=1
*Create enr_next_fall variable. Merge back on
preserve
keep if enr_term==2
bys person enr_year: keep if _n==1
bys person (enr_year): gen next = enr_year[_n+1]
replace next = next - 1
gen enr_next_fall = enr_year==next
drop next
tempfile fall
save `fall'
restore
merge m:1 person enr_year using `fall'
drop _merge
replace enr_next_fall = . if enr_term!=2