Pandas可以为重复项创建额外的列

时间:2015-04-23 16:07:39

标签: python pandas

所以我有一些重复索引的数据和我想要的列。示例

df = pd.DataFrame({
                "id":[1,1,1,2,2,3,3,3],
                "contact_type":["email","phone","phone","email","mobile","email","phone","mobile"],
                "contact":["a@a.ca","123","456","b@b.com","78432","c@c.ca","12","12"]
                })

我正在尝试做的是让每个ID都是一行。我的理想输出是

ID    email      phone      phone.1    mobile
1     a@a.ca     123        456        NaN
2     b@b.com    NaN        NaN        78432
3     c@c.ca     12         NaN        12

尝试使用df.pivot(“id”,“contact_type”,“contact”)给我一个错误“索引包含重复的条目,无法重塑”。问题是它似乎不喜欢ID 1在contact_type中有2个电话。那么我还有另一种方法可以将数据转换为这种格式吗?

1 个答案:

答案 0 :(得分:0)

我认为您必须逐个汇总最终数据帧(pd.concat),因为您事先并不知道,ID最多可能有多少个不同的电话号码。假设每个ID最多只有1个电子邮件或手机号码:

In [130]:

df_mail = df.ix[df.contact_type=='email', ['contact', 'id']].set_index('id')
In [131]:

df_mobile = df.ix[df.contact_type=='mobile', ['contact', 'id']].set_index('id')
In [132]:

df_phone = df.ix[df.contact_type=='phone', ['contact', 'id']]
In [133]:
# make a columns stores 'phone0', 'phone1' and so on:
df_phone['field'] = 'Phone' + df_phone.groupby('id').transform(lambda x: range(len(x))).contact.map(str)
In [134]:

df_phone = df_phone.pivot('id', 'field', 'contact')
In [135]:

df_mail.columns = ['Email']
df_mobile.columns = ['Mobile']
In [136]:

print pd.concat((df_mail, df_phone, df_mobile), axis=1)
      Email Phone0 Phone1 Mobile
id                              
1    a@a.ca    123    456    NaN
2   b@b.com    NaN    NaN  78432
3    c@c.ca     12    NaN     12