所以我有一些重复索引的数据和我想要的列。示例
df = pd.DataFrame({
"id":[1,1,1,2,2,3,3,3],
"contact_type":["email","phone","phone","email","mobile","email","phone","mobile"],
"contact":["a@a.ca","123","456","b@b.com","78432","c@c.ca","12","12"]
})
我正在尝试做的是让每个ID都是一行。我的理想输出是
ID email phone phone.1 mobile
1 a@a.ca 123 456 NaN
2 b@b.com NaN NaN 78432
3 c@c.ca 12 NaN 12
尝试使用df.pivot(“id”,“contact_type”,“contact”)给我一个错误“索引包含重复的条目,无法重塑”。问题是它似乎不喜欢ID 1在contact_type中有2个电话。那么我还有另一种方法可以将数据转换为这种格式吗?
答案 0 :(得分:0)
我认为您必须逐个汇总最终数据帧(pd.concat
),因为您事先并不知道,ID最多可能有多少个不同的电话号码。假设每个ID最多只有1个电子邮件或手机号码:
In [130]:
df_mail = df.ix[df.contact_type=='email', ['contact', 'id']].set_index('id')
In [131]:
df_mobile = df.ix[df.contact_type=='mobile', ['contact', 'id']].set_index('id')
In [132]:
df_phone = df.ix[df.contact_type=='phone', ['contact', 'id']]
In [133]:
# make a columns stores 'phone0', 'phone1' and so on:
df_phone['field'] = 'Phone' + df_phone.groupby('id').transform(lambda x: range(len(x))).contact.map(str)
In [134]:
df_phone = df_phone.pivot('id', 'field', 'contact')
In [135]:
df_mail.columns = ['Email']
df_mobile.columns = ['Mobile']
In [136]:
print pd.concat((df_mail, df_phone, df_mobile), axis=1)
Email Phone0 Phone1 Mobile
id
1 a@a.ca 123 456 NaN
2 b@b.com NaN NaN 78432
3 c@c.ca 12 NaN 12