我对Python比较陌生,我有一个数据集 如下:
hhid psid year
1 1 1989
1 1 1991
1 1 1993
1 1 2000
1 2 1989
1 2 1991
1 2 1993
1 2 2000
2 1 1989
2 1 1991
2 1 1993
2 1 2000
... ... ...
hhid=household ID
psid=personal ID within a household
我的问题是如何创建个人ID(比如说 uid)应用于外观的面板数据集 像:
hhid psid year uid
1 1 1989 1
1 1 1991 1
1 1 1993 1
1 1 2000 1
1 2 1989 2
1 2 1991 2
1 2 1993 2
1 2 2000 2
2 1 1989 3
2 1 1991 3
2 1 1993 3
2 1 2000 3
答案 0 :(得分:0)
如果您将数据集加载到pandas数据框df
中,那么您可以尝试:
df['uid'] = df['hhid'].astype(str) + '_' + df['psid'].astype(str)
产生:
+------+------+------+-----+
| hhid | psid | year | uid |
+------+------+------+-----+
| 1 | 1 | 1989 | 1_1 |
| 1 | 1 | 1991 | 1_1 |
| 1 | 1 | 1993 | 1_1 |
| 1 | 1 | 2000 | 1_1 |
| 1 | 2 | 1989 | 1_2 |
| 1 | 2 | 1991 | 1_2 |
| 1 | 2 | 1993 | 1_2 |
| 1 | 2 | 2000 | 1_2 |
| 2 | 1 | 1989 | 2_1 |
| 2 | 1 | 1991 | 2_1 |
| 2 | 1 | 1993 | 2_1 |
| 2 | 1 | 2000 | 2_1 |
+------+------+------+-----+
答案 1 :(得分:0)
必须有一个更简单的方法,但我选择了这个:
combos = zip(df.hhid.tolist(), df.psid.tolist())
maps = zip(range(1, len(set(combos))+1), sorted(set(combos), key=combos.index))
final_maps = {k: v for v, k in maps}
df['uid'] = zip(df['hhid'],df['psid'])
df['uid'] = df['uid'].apply(lambda x: final_maps[x])