我有一张这样的表:
Name | ID | Contact_method | Contact
sarah 1 house h1
sarah 1 mobile m1
sarah 1 email sarah@mail
bob 2 house h2
bob 2 mobile m2
bob 2 email bob@mail
jones 3 house h3
jones 3 mobile m3
jones 3 email jones@mail
jones 4 house h4
jones 4 mobile m4
jones 4 email jones2@mail
我希望如此:
Name | ID | house | mobile | email
sarah 1 h1 m1 sarah@mail
bob 2 h2 m2 bob@mail
jones 3 h3 m3 jones@mail
jones 4 h4 m4 jones2@mail
我已经可以这样做,但只能通过迭代所有唯一ID的非常昂贵的pd.concat
操作。有一个简单的方法吗?我还修改了pivot()
和transpose()
。请注意,重复的名称是存在的,因此我不能依赖列值的唯一性来执行join
。
答案 0 :(得分:2)
使用除'Contact_method'
之外的所有列设置索引,然后设置unstack
df.set_index(
['Name', 'ID', 'Contact_method']
)['Contact'].unstack().rename_axis(None, 1).reset_index()
Name ID email house mobile
0 bob 2 bob@mail h2 m2
1 jones 3 jones@mail h3 m3
2 jones 4 jones2@mail h4 m4
3 sarah 1 sarah@mail h1 m1
答案 1 :(得分:0)
一种方法是根据ID('manualy')构建一个(词典)联系词典。不确定它是否更有效率。
people = dict()
for index, row in pd.iterrows():
ID = row['ID']
if ID not in people:
people[ID] = {'ID': ID, 'Name': row['Name']}
people[ID][row['Contact_method']] = row['Contact']
print pandas.DataFrame(people).transpose()
输出是:
ID Name email house mobile
1 1 sarah sarah@mail h1 m1
2 2 bob bob@mail h2 m2
3 3 jones jones@mail h3 m3
4 4 jones jones2@mail h4 m4
答案 2 :(得分:0)
我认为piRSquared's solution非常好,但如果得到:
ValueError:索引包含重复的条目,无法重塑
print (df)
Name ID Contact_method Contact
0 sarah 1 house h1
1 sarah 1 mobile m1
2 sarah 1 email sarah@mail
3 bob 2 house h2
4 bob 2 mobile m2
5 bob 2 email bob@mail
6 jones 3 house h3
7 jones 3 mobile m3
8 jones 3 email jones@mail <-for same Name,ID and Contact_method get duplicate
9 jones 3 email joe@mail <-for same Name,ID and Contact_method get duplicate
10 jones 4 house h4
11 jones 4 mobile m4
12 jones 4 email jones2@mail
使用pivot_table
或groubpy
汇总join
:
cols = ['Name','ID','house','mobile','email']
df1 = df.pivot_table(index=['ID','Name'],
columns='Contact_method',
values='Contact',
aggfunc=','.join)
.rename_axis(None, 1)
.reset_index()
.reindex_axis(cols, axis=1)
print (df1)
Name ID house mobile email
0 sarah 1 h1 m1 sarah@mail
1 bob 2 h2 m2 bob@mail
2 jones 3 h3 m3 jones@mail,joe@mail <- join duplicates
3 jones 4 h4 m4 jones2@mail
df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact']
.apply(','.join)
.unstack()
.rename_axis(None, 1)
.reset_index()
.reindex_axis(cols, axis=1)
print (df1)
Name ID house mobile email
0 sarah 1 h1 m1 sarah@mail
1 bob 2 h2 m2 bob@mail
2 jones 3 h3 m3 jones@mail,joe@mail <- join duplicates
3 jones 4 h4 m4 jones2@mail