Question

我有这样的数据集。

FamilyID     Status          personID  spouseID  HeadID spouse_of_referenceID    income
   1          Head              1        2          1       2
   1          Spouse of head    2        1          1       2
   1          Child             3        NA         1       2
   2          Head              1        3          1       3
   2          Spouse of head    3        1          1       3

对于每个孩子＆＃34;我想创建一个变量＆＃34;父母＆＃39;收入＆＃34;这是头部收入和头部配偶收入的总和。

我在想像

这样的东西

bysort family: egen parentsincome =   if ??? status==4

因为如果这个人还是孩子，status是4。

但我不知道如何继续下一步。我考虑过使用_n，但我无法想到真正的解决方案。

Answer 1

这是一个弱数据示例：声明了7个变量，但只有6个示例，并且没有使用Stata。 “NA”不是丢失的Stata代码。需要一些工程来理解它。 Statalist也提供了有关准备适用于此处的数据示例的建议。 advice on Stata data examples

您可以直接使用egen以头部或配偶的身份获得总数。

clear 
input FamilyID str14 Status personID spouseID HeadID spouse_of_referenceID income
 1 "Head" 1 2 1 2  1000 
 1 "Spouse of head" 2 1 1 2 2000 
 1 "Child" 3 . 1 2   0 
 2 "Head" 1 3 1 3  3000 
 2 "Spouse of head" 3 1 1 3 4000 
end 

egen HSIncome = total(income / inlist(Status, "Head", "Spouse of head")), by(FamilyID ) 

list FamilyID Status personID income HSIncome, sepby(FamilyID) 

     +----------------------------------------------------------+
     | FamilyID           Status   personID   income   HSIncome |
     |----------------------------------------------------------|
  1. |        1             Head          1     1000       3000 |
  2. |        1   Spouse of head          2     2000       3000 |
  3. |        1            Child          3        0       3000 |
     |----------------------------------------------------------|
  4. |        2             Head          1     3000       7000 |
  5. |        2   Spouse of head          3     4000       7000 |
     +----------------------------------------------------------+

参见例如this paper第9节和第10节审查技术。

如果您使用值标签来显示状态，则代码自然会有所不同。

egen的帮助明确表示您不应尝试结合使用_n。这是因为egen经常对数据进行临时排序，因此观察结果可能会改变数据集中的顺序。

创造家庭中其他成员的收入

1 个答案: