python组id面板数据

时间:2017-04-06 15:46:30

标签: python panel

我对Python比较陌生,我有一个数据集 如下:

hhid psid year
 1    1   1989
 1    1   1991
 1    1   1993
 1    1   2000
 1    2   1989
 1    2   1991
 1    2   1993
 1    2   2000
 2    1   1989
 2    1   1991
 2    1   1993
 2    1   2000

 ... ...  ...
hhid=household ID
psid=personal ID within a household

我的问题是如何创建个人ID(比如说 uid)应用于外观的面板数据集 像:

hhid psid year uid
 1    1   1989  1
 1    1   1991  1
 1    1   1993  1
 1    1   2000  1
 1    2   1989  2
 1    2   1991  2
 1    2   1993  2
 1    2   2000  2
 2    1   1989  3
 2    1   1991  3
 2    1   1993  3
 2    1   2000  3

2 个答案:

答案 0 :(得分:0)

如果您将数据集加载到pandas数据框df中,那么您可以尝试:

df['uid'] = df['hhid'].astype(str) + '_' + df['psid'].astype(str)

产生: +------+------+------+-----+ | hhid | psid | year | uid | +------+------+------+-----+ | 1 | 1 | 1989 | 1_1 | | 1 | 1 | 1991 | 1_1 | | 1 | 1 | 1993 | 1_1 | | 1 | 1 | 2000 | 1_1 | | 1 | 2 | 1989 | 1_2 | | 1 | 2 | 1991 | 1_2 | | 1 | 2 | 1993 | 1_2 | | 1 | 2 | 2000 | 1_2 | | 2 | 1 | 1989 | 2_1 | | 2 | 1 | 1991 | 2_1 | | 2 | 1 | 1993 | 2_1 | | 2 | 1 | 2000 | 2_1 | +------+------+------+-----+

答案 1 :(得分:0)

必须有一个更简单的方法,但我选择了这个:

combos = zip(df.hhid.tolist(), df.psid.tolist())
maps = zip(range(1, len(set(combos))+1), sorted(set(combos), key=combos.index))
final_maps = {k: v for v, k in maps}

df['uid'] = zip(df['hhid'],df['psid'])
df['uid'] = df['uid'].apply(lambda x: final_maps[x])