熊猫:在等级上转动

时间:2016-04-06 00:18:31

标签: python pandas pivot dataframe

鉴于此数据:

..., key=lambda x: x[-1].split(','), ...)

如何将其转为此:

pd.DataFrame({'id':['aaa','aaa','abb','abb','abb','acd','acd','acd'],
              'loc':['US','UK','FR','US','IN','US','CN','CN']})

    id loc
0  aaa  US
1  aaa  UK
2  abb  FR
3  abb  US
4  abb  IN
5  acd  US
6  acd  CN
7  acd  CN

我正在寻找最惯用的方法。

1 个答案:

答案 0 :(得分:2)

我认为您可以使用groupbycumcount创建新列cols并按astype转换为string,最后使用pivot

df['cols'] = 'loc' + (df.groupby('id')['id'].cumcount() + 1).astype(str)
print df
    id loc  cols
0  aaa  US  loc1
1  aaa  UK  loc2
2  abb  FR  loc1
3  abb  US  loc2
4  abb  IN  loc3
5  acd  US  loc1
6  acd  CN  loc2
7  acd  CN  loc3

print df.pivot(index='id', columns='cols', values='loc')
cols loc1 loc2  loc3
id                  
aaa    US   UK  None
abb    FR   US    IN
acd    US   CN    CN

如果要删除索引和列名称,请使用rename_axis

print df.pivot(index='id', columns='cols', values='loc').rename_axis(None)
                                                        .rename_axis(None, axis=1)
    loc1 loc2  loc3
aaa   US   UK  None
abb   FR   US    IN
acd   US   CN    CN

总之,谢谢Colin

print pd.pivot(df['id'], 'loc' + (df.groupby('id').cumcount() + 1).astype(str), df['loc'])
        .rename_axis(None)
        .rename_axis(None, axis=1)

    loc1 loc2  loc3
aaa   US   UK  None
abb   FR   US    IN
acd   US   CN    CN    

我尝试rank,但我在版本0.18.0中收到错误:

print df.groupby('id')['loc'].transform(lambda x: x.rank(method='first'))
#ValueError: first not supported for non-numeric data