以下是数据框的子集:
drug_id WD
lexapro.1 flu-like symptoms
lexapro.1 dizziness
lexapro.1 headache
lexapro.14 Dizziness
lexapro.14 headaches
lexapro.23 extremely difficult
lexapro.32 cry at anything
lexapro.32 Anxiety
我需要根据id
中的值生成一列drug_id
,如下所示:
id drug_id WD
1 lexapro.1 flu-like symptoms
1 lexapro.1 dizziness
1 lexapro.1 headache
2 lexapro.14 Dizziness
2 lexapro.14 headaches
3 lexapro.23 extremely difficult
4 lexapro.32 cry at anything
4 lexapro.32 Anxiety
我想我需要根据drug_id对它们进行分组,然后根据每个组的大小生成id。但我不知道怎么做?
答案 0 :(得分:1)
Boud提到的shift + cumsum模式很好,只需确保先按drug_id
排序。等等,
df = df.sort_values('drug_id')
df['id'] = (df['drug_id'] != df['drug_id'].shift()).cumsum()
不涉及对数据框进行排序的另一种方法是将数字映射到每个唯一drug_id
。
uid = df['drug_id'].unique()
id_map = dict((x, y) for x, y in zip(uid, range(1, len(uid)+1)))
df['id'] = df['drug_id'].map(id_map)
答案 1 :(得分:0)
使用shift + cumsum模式:
(df.drug_id!=df.drug_id.shift()).cumsum()
Out[5]:
0 1
1 1
2 1
3 2
4 2
5 3
6 4
7 4
Name: drug_id, dtype: int32