我有一个包含5,000多个案例的SPSS数据集,如下所示:
ID, relation to head of household
1, head of household
1, son
1, partner
2, head of household
2, son
3, head of household
3, son
3, cousin
我需要计算拥有
的家庭数量我知道这应该使用ID作为分段变量来完成,但无法弄清楚如何。
答案 0 :(得分:1)
一种方法是为每个类别创建一组虚拟变量,然后使用AGGREGATE获取家庭级别统计信息。
DATA LIST LIST (",") /ID (F1.0) Relation (A20).
BEGIN DATA
1,head of household
1,son
1,partner
2,head of household
2,son
3,head of household
3,son
3,cousin
END DATA.
DATASET NAME Houses.
*Making dummy variables.
COMPUTE HeadHouse = (Relation = "head of household").
COMPUTE Partner = (Relation = "partner").
COMPUTE Child = (Relation = "son").
COMPUTE Relative = (Relation = "cousin").
DATASET DECLARE AggHouse.
AGGREGATE OUTFILE='AggHouse'
/BREAK ID
/HeadHouse = SUM(HeadHouse)
/Partner = SUM(Partner)
/Child = SUM(Child)
/Relative = SUM(Relative).
然后使用聚合数据集,您可以随后使用IF语句来计算所需的条件。 E.g。
DATASET ACTIVATE AggHouse.
IF (HeadHouse > 0) AND (Child > 0) First = 1.
IF (HeadHouse > 0) AND (Partner > 0) AND (Child > 0) Second = 1.
对于您的真实数据集,您需要为原始虚拟变量集插入更多条件,但我将其作为练习留给您。