我有一个具有唯一记录的Pandas DataFrame,但我需要根据其中一个列创建一个唯一键。下面是示例数据和我尝试通过迭代数据并将计数增加1来创建第二列。我的计划是加入两者来创造独特的钥匙。
问题: 有更好的方法吗? 我的方法有什么缺陷?
import pandas as pd
import numpy as np
d = {'subid': {0: '327598650129611740', 1: '327598650129611740', 2: '327559921352747760', 3: '327676431535405027', 4: '327676431535405027', 5: '327676431535405027', 6: '327662567602840733', 7: '327778468325442201', 8: '327777161261272775', 9: '327777161261272775'}}
df = pd.DataFrame(d)
old_index = 0
child_no = 1
for subid, row in df.iterrows():
if subid == old_index:
df['child_no'] = child_no + 1
old_index = subid
child_no = child_no + 1
else:
child_no = 1
df['child_no'] = child_no
old_index = subid
df
subid child_no
0 327598650129611740 1
1 327598650129611740 1
2 327559921352747760 1
3 327676431535405027 1
4 327676431535405027 1
5 327676431535405027 1
6 327662567602840733 1
7 327778468325442201 1
8 327777161261272775 1
9 327777161261272775 1
期望的结果
subid child_no
0 327598650129611740 1
1 327598650129611740 2
2 327559921352747760 1
3 327676431535405027 1
4 327676431535405027 2
5 327676431535405027 3
6 327662567602840733 1
7 327778468325442201 1
8 327777161261272775 1
9 327777161261272775 2
任何帮助都将不胜感激。
答案 0 :(得分:2)
你可以groupby
on' subid'然后拨打cumcount
并从0
开始添加1:
In [30]:
df['child_no'] = df.groupby('subid').cumcount()+1
df
Out[30]:
subid child_no
0 327598650129611740 1
1 327598650129611740 2
2 327559921352747760 1
3 327676431535405027 1
4 327676431535405027 2
5 327676431535405027 3
6 327662567602840733 1
7 327778468325442201 1
8 327777161261272775 1
9 327777161261272775 2