基于pandas(python)中另一列中的值生成列

时间:2017-01-31 02:53:58

标签: python pandas

以下是数据框的子集:

 drug_id    WD
lexapro.1   flu-like symptoms
lexapro.1   dizziness
lexapro.1   headache
lexapro.14  Dizziness
lexapro.14  headaches
lexapro.23  extremely difficult 
lexapro.32  cry at anything
lexapro.32  Anxiety 

我需要根据id中的值生成一列drug_id,如下所示:

id    drug_id        WD
1       lexapro.1   flu-like symptoms
1       lexapro.1   dizziness
1       lexapro.1   headache 
2       lexapro.14  Dizziness
2       lexapro.14  headaches
3       lexapro.23   extremely difficult 
4       lexapro.32  cry at anything
4       lexapro.32  Anxiety 

我想我需要根据drug_id对它们进行分组,然后根据每个组的大小生成id。但我不知道怎么做?

2 个答案:

答案 0 :(得分:1)

Boud提到的shift + cumsum模式很好,只需确保先按drug_id排序。等等,

df = df.sort_values('drug_id')
df['id'] = (df['drug_id'] != df['drug_id'].shift()).cumsum()

不涉及对数据框进行排序的另一种方法是将数字映射到每个唯一drug_id

uid = df['drug_id'].unique() 
id_map = dict((x, y) for x, y in zip(uid, range(1, len(uid)+1))) 
df['id'] = df['drug_id'].map(id_map) 

答案 1 :(得分:0)

使用shift + cumsum模式:

(df.drug_id!=df.drug_id.shift()).cumsum()
Out[5]: 
0    1
1    1
2    1
3    2
4    2
5    3
6    4
7    4
Name: drug_id, dtype: int32