我的数据框如下:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
'code' : ['one', 'one', 'two', 'three',
'two', 'three', 'one', 'two'],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'amount' : np.random.randn(8)}, columns= ['id','code','colour','amount'])
我希望能够按id
和code
对colour
进行分组,然后根据amount
对其进行排序。我知道如何groupby()
:
df.groupby(['code','colour']).head(5)
id code colour amount
code colour
one black 0 1 one black -0.117307
white 1 2 one white 1.653216
6 7 one white 0.817205
three black 5 6 three black 0.567162
white 3 4 three white 0.579074
two black 4 5 two black -1.683988
white 2 3 two white -0.457722
7 8 two white -1.277020
但是,我想要的输出如下所示,我有两列:1。code/colour
包含关键字符串和2. id:amount
包含id
- amount
元组按降序排序wrt amount
:
code/colour id:amount
one/black {1:-0.117307}
one/white {2:1.653216, 7:0.817205}
three/black {6:0.567162}
three/white {4:0.579074}
two/black {5:-1.683988}
two/white {3:-0.457722, 8:-1.277020}
如何将上面显示的DataFrameGroupBy
对象转换为我想要的格式?或者,我不应该首先使用groupby()
吗?
修改 虽然不是指定的格式,但下面的代码给了我想要的功能:
groups = dict(list(df.groupby(['code','colour'])))
groups['one','white']
id code colour amount
1 2 one white 1.331766
6 7 one white 0.808739
如何将群组缩减为仅包含id
和amount
列?
答案 0 :(得分:9)
首先,使用groupby代码和颜色,然后应用自定义函数来格式化id和amount:
df = df.groupby(['code', 'colour']).apply(lambda x:x.set_index('id').to_dict('dict')['amount'])
然后修改索引:
df.index = ['/'.join(i) for i in df.index]
它将返回一个系列,您可以通过以下方式将其转换回DataFrame:
df = df.reset_index()
最后,按以下方式添加列名称:
df.columns=['code/colour','id:amount']
结果:
In [105]: df
Out[105]:
code/colour id:amount
0 one/black {1: 0.392264412544}
1 one/white {2: 2.13950686015, 7: -0.393002947047}
2 three/black {6: -2.0766612539}
3 three/white {4: -1.18058561325}
4 two/black {5: -1.51959565941}
5 two/white {8: -1.7659863039, 3: -0.595666853895}
答案 1 :(得分:1)
这是一种“丑陋”的做法。首先,由于dict
不可用,所以你想要的输出在Pandas中不能很好地发挥作用;所以你可能会失去真正的好处!
od = OrderedDict()
for name, group in df.groupby(['code', 'colour']):
# Convert the group to a dict
temp = group[['id', 'amount']].sort(['amount'], ascending=[0]).to_dict()
# Extract id:amount
temp2 = {temp['id'][key]: temp['amount'][key] for key in temp['amount'].iterkeys()}
od["%s/%s" % (name)] = temp2
这只是一个开始!不完全是你想要的。