python pandas:数据透视表

时间:2017-03-16 23:36:40

标签: python pandas

我有一张这样的表:

Name   | ID | Contact_method | Contact
sarah    1   house            h1
sarah    1   mobile           m1
sarah    1   email            sarah@mail
bob      2   house            h2
bob      2   mobile           m2
bob      2   email            bob@mail
jones    3   house            h3
jones    3   mobile           m3
jones    3   email            jones@mail
jones    4   house            h4
jones    4   mobile           m4
jones    4   email            jones2@mail

我希望如此:

Name  | ID | house | mobile | email
sarah   1    h1      m1       sarah@mail
bob     2    h2      m2       bob@mail
jones   3    h3      m3       jones@mail
jones   4    h4      m4       jones2@mail

我已经可以这样做,但只能通过迭代所有唯一ID的非常昂贵的pd.concat操作。有一个简单的方法吗?我还修改了pivot()transpose()。请注意,重复的名称是存在的,因此我不能依赖列值的唯一性来执行join

3 个答案:

答案 0 :(得分:2)

使用除'Contact_method'之外的所有列设置索引,然后设置unstack

df.set_index(
    ['Name', 'ID', 'Contact_method']
)['Contact'].unstack().rename_axis(None, 1).reset_index()

    Name  ID        email house mobile
0    bob   2     bob@mail    h2     m2
1  jones   3   jones@mail    h3     m3
2  jones   4  jones2@mail    h4     m4
3  sarah   1   sarah@mail    h1     m1

答案 1 :(得分:0)

一种方法是根据ID('manualy')构建一个(词典)联系词典。不确定它是否更有效率。

people = dict()
for index, row in pd.iterrows():
    ID = row['ID']
    if ID not in people:
        people[ID] = {'ID': ID, 'Name': row['Name']}
    people[ID][row['Contact_method']] = row['Contact']

print pandas.DataFrame(people).transpose()

输出是:

  ID   Name        email house mobile
1  1  sarah   sarah@mail    h1     m1
2  2    bob     bob@mail    h2     m2
3  3  jones   jones@mail    h3     m3
4  4  jones  jones2@mail    h4     m4

答案 2 :(得分:0)

我认为piRSquared's solution非常好,但如果得到:

  

ValueError:索引包含重复的条目,无法重塑

print (df)
     Name  ID Contact_method      Contact
0   sarah   1          house           h1
1   sarah   1         mobile           m1
2   sarah   1          email   sarah@mail
3     bob   2          house           h2
4     bob   2         mobile           m2
5     bob   2          email     bob@mail
6   jones   3          house           h3
7   jones   3         mobile           m3
8   jones   3          email   jones@mail <-for same Name,ID and Contact_method get duplicate
9   jones   3          email     joe@mail <-for same Name,ID and Contact_method get duplicate
10  jones   4          house           h4
11  jones   4         mobile           m4
12  jones   4          email  jones2@mail

使用pivot_tablegroubpy汇总join

cols = ['Name','ID','house','mobile','email']
df1 = df.pivot_table(index=['ID','Name'],
                     columns='Contact_method', 
                     values='Contact', 
                     aggfunc=','.join)
        .rename_axis(None, 1)
        .reset_index()
        .reindex_axis(cols, axis=1)
print (df1)
    Name  ID house mobile                email
0  sarah   1    h1     m1           sarah@mail
1    bob   2    h2     m2             bob@mail
2  jones   3    h3     m3  jones@mail,joe@mail <- join duplicates
3  jones   4    h4     m4          jones2@mail

df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact']
        .apply(','.join)
        .unstack()
        .rename_axis(None, 1)
        .reset_index()
        .reindex_axis(cols, axis=1)
print (df1)
    Name  ID house mobile                email
0  sarah   1    h1     m1           sarah@mail
1    bob   2    h2     m2             bob@mail
2  jones   3    h3     m3  jones@mail,joe@mail <- join duplicates
3  jones   4    h4     m4          jones2@mail