我想在我的横截面调查数据集中创建一个新列,其中包括对女性丈夫的教育。我有家庭(隐藏)和个人(HL1)的ID,以及以下信息:
本质上,我想创建代码来执行以下操作:
我尝试了这个,但它不起作用:
bysort hid (HL6) : gen husb_educ = ED4A[MA2]
以下是数据集中的示例:
+-----+----------+-----+-----+--------+-----+----------+
| HL1 | MA1 | MA2 | hid | HL4 | HL6 | ED4A |
+-----+----------+-----+-----+--------+-----+----------+
| 1 | | | 106 | Male | 57 | Diploma |
| 2 | | | 106 | Female | 53 | Intermed |
| 3 | | | 106 | Male | 30 | Higher S |
| 4 | No, not | | 106 | Female | 24 | Bachelor |
| 5 | | | 106 | Male | 22 | Diploma |
| 6 | | | 106 | Male | 17 | Secondar |
| 7 | | | 106 | Female | 10 | Primary |
| 8 | Yes, cur | 22 | 106 | Female | 23 | Diploma |
| 9 | | | 106 | Female | 0 | |
+-----+----------+-----+-----+--------+-----+----------+
所以在这个例子中,我想要一个新的专栏,其中说明了丈夫的教育,并且在第8行中,将文凭作为新专栏中的价值(因为这位女性的丈夫已经22岁了, 22岁的男性在家中有文凭。
相同样本,没有值标签:
+-----+-----+-----+-----+-----+-----+------+
| HL1 | MA1 | MA2 | hid | HL4 | HL6 | ED4A |
+-----+-----+-----+-----+-----+-----+------+
| 1 | | | 106 | 1 | 57 | 4 |
| 2 | | | 106 | 2 | 53 | 2 |
| 3 | | | 106 | 1 | 30 | 6 |
| 4 | 3 | | 106 | 2 | 24 | 5 |
| 5 | | | 106 | 1 | 22 | 4 |
| 6 | | | 106 | 1 | 17 | 3 |
| 7 | | | 106 | 2 | 10 | 1 |
| 8 | 1 | 22 | 106 | 2 | 23 | 4 |
| 9 | | | 106 | 2 | 0 | |
+-----+-----+-----+-----+-----+-----+------+
一个特别大的家庭:
input
HL1 MA1 MA2 hid HL4 HL6 ED4A
1 . . 365809 1 33 1
2 1 33 365809 2 26 1
1 . . 365810 1 58 1
2 . . 365810 2 54 .
3 . . 365810 1 23 3
4 . . 365810 1 23 2
5 . . 365810 1 18 3
6 . . 365810 1 15 2
7 . . 365810 2 12 2
8 . . 365810 1 33 3
9 1 dk 365810 2 31 1
10 . . 365810 2 13 2
11 . . 365810 2 11 1
12 . . 365810 1 9 1
13 . . 365810 1 6 1
14 . . 365810 2 3 .
15 . . 365810 1 2 .
16 . . 365810 1 33 3
17 1 33 365810 2 30 1
18 . . 365810 1 8 1
19 . . 365810 2 6 1
20 . . 365810 2 5 .
21 . . 365810 1 1 .
22 . . 365810 1 32 4
23 1 32 365810 2 30 1
24 . . 365810 1 5 .
25 . . 365810 2 3 .
26 . . 365810 1 2 .
27 . . 365810 1 30 4
28 1 30 365810 2 28 1
29 . . 365810 2 2 .
30 . . 365810 1 0 .
31 . . 365810 1 27 2
32 1 27 365810 2 27 1
33 . . 365810 2 2 .
34 . . 365810 2 0 .
end
答案 0 :(得分:0)
由于您已经概述了执行所需操作所需的步骤,因此编写一个简单的脚本应该不会有问题。 根据我的经验,如果您单独编写/执行每个步骤(并查看每个步骤后发生的情况,如果引入任何错误等),则更容易学习语法。掌握它之后,您可以将代码缩小到一行。这样的事情应该有效(尝试按照你的问题中的步骤):
*look at wife currently married
*not necessary, as only married women have MA2, but next step takes only married women into account
* generate husbands age variable and spread to whole household (new var to keep original MA2 untouched)
gen husband_age=MA2 if MA1==married & HL4==woman
bys hid: egen husband_age_hid=max(husband_age)
*mark which individual is the husband (assumed this is what was meant by pairing age of husband with age of male in household)
gen husband=0
bys hid: replace husband = 1 if husband_age_hid == HL6
*copy husbands education information to the whole household
gen husband_ED4 = ED4 if husband==1
bys hid: egen husb_educ=max(husband_ED4)
*data cleaning, if necessary
drop husband*
可能更好地使用tempvars而不是在第一步中生成新变量,但认为这些变量以后可能会有用。
答案 1 :(得分:0)
这是一个开始。该守则确实循环于每个家庭中的不同已婚妇女,但如果两个或更多男性与丈夫的年龄相匹配则无效。
input HL1 MA1 MA2 hid HL4 HL6 ED4A
1 . . 106 1 57 4
2 . . 106 2 53 2
3 . . 106 1 30 6
4 3 . 106 2 24 5
5 . . 106 1 22 4
6 . . 106 1 17 3
7 . . 106 2 10 1
8 1 22 106 2 23 4
9 . . 106 2 0 .
end
bysort hid (MA1) : gen wid = _n if MA1 == 1
su wid, meanonly
local max = r(max)
gen heducation = .
quietly forval i = 1/`max' {
bysort hid : egen hage = min(cond(wid == `i', MA2, .))
by hid : egen nmatches = total(HL4 == 1 & HL6 == hage)
by hid : egen work = min(cond(nmatches == 1 & HL6 == hage, ED4, .))
replace heducation = work if wid == `i'
drop hage nmatches work
}
sort hid HL1
list
+-----------------------------------------------------------+
| HL1 MA1 MA2 hid HL4 HL6 ED4A wid heduca~n |
|-----------------------------------------------------------|
1. | 1 . . 106 1 57 4 . . |
2. | 2 . . 106 2 53 2 . . |
3. | 3 . . 106 1 30 6 . . |
4. | 4 3 . 106 2 24 5 . . |
5. | 5 . . 106 1 22 4 . . |
|-----------------------------------------------------------|
6. | 6 . . 106 1 17 3 . . |
7. | 7 . . 106 2 10 1 . . |
8. | 8 1 22 106 2 23 4 1 4 |
9. | 9 . . 106 2 0 . . . |
+-----------------------------------------------------------+
(更新)
扩展示例发现了一个错误:一项计算不够限制,不排除年龄相同的女性。 (顺便提一下,请注意新数据是针对两个家庭,而不是一个。)
bysort hid (MA1) : gen wid = _n if MA1 == 1
su wid, meanonly
local max = r(max)
gen heducation = .
quietly forval i = 1/`max' {
bysort hid : egen hage`i' = min(cond(wid == `i', MA2, .))
by hid : egen nmatches`i' = total(HL4 == 1 & HL6 == hage`i')
by hid : egen work`i' = min(cond(nmatches`i' == 1 & HL6 == hage`i' & HL4 == 1, ED4, .))
replace heducation = work`i' if wid == `i'
}
sort hid wid HL1
list hid wid MA2 HL6 ED4 heducation HL4 if inlist(HL6, 27, 30, 32, 33) | MA2 < ., sepby(hid)
+--------------------------------------------------+
| hid wid MA2 HL6 ED4A heduca~n HL4 |
|--------------------------------------------------|
1. | 365809 1 33 26 1 1 2 |
2. | 365809 . . 33 1 . 1 |
|--------------------------------------------------|
3. | 365810 1 27 27 1 2 2 |
4. | 365810 2 33 30 1 . 2 |
5. | 365810 3 32 30 1 4 2 |
6. | 365810 4 30 28 1 4 2 |
14. | 365810 . . 33 3 . 1 |
21. | 365810 . . 33 3 . 1 |
26. | 365810 . . 32 4 . 1 |
30. | 365810 . . 30 4 . 1 |
33. | 365810 . . 27 2 . 1 |
+--------------------------------------------------+
有关更一般性的讨论,请参阅