迭代多索引数据帧(Python)并将索引分配给索引值对

时间:2016-08-02 15:43:22

标签: python pandas group-by

我的智慧结束了...我有一个三列的数据框(aff_id,mkt和bkgs)我按其中两个分组(aff_id和mkt):

undefined

给我一​​个看起来有点像这样的多索引数据框:

df_gb_aff = df.groupby(["affiliate_id", 'mkt']).sum()
df_gb_aff.sort('bkgs', ascending=False, inplace=True)

我现在要做的是遍历每个aff_id,并制作mkt(key) - bkgs(value)对的dict,但由于每个aff_id值具有不同的mkt值,因此当df_gb_aff时Python会抛出错误。 loc [index_1,index_2]不存在。

我一直在用这些索引获取索引:

                                bkgs
aff_id          mkt 
2508b863a1a4    bcab9d6ec630    1910.707124
6cc5f0e8c96b    b7d0dbd38376    1374.924684
188e238326e4    446bb566f202    1206.589522
                dbe759c691eb    1203.979908
6cc5f0e8c96b    0e9013464c4c    1203.532310

并尝试迭代:

aff_list = df_gb_aff.index.levels[0].values
mkt_list = df_gb_aff.index.levels[1].values

任何人都有明智的做法吗?

1 个答案:

答案 0 :(得分:1)

dict理解的另一种解决方案:

d = {idx[1]: df_gb_aff.ix[idx][0] for idx in df_gb_aff.index}

print (d)
{'446bb566f202': 1206.589522, 
'bcab9d6ec630': 1910.7071239999998, 
'0e9013464c4c': 1203.5323100000001, 
'dbe759c691eb': 1203.979908, 
'b7d0dbd38376': 1374.9246840000001}

print (d['bcab9d6ec630'])
1910.707124

如果需要循环Multiindex

for idx in df_gb_aff.index:
    print (idx)
    print (df_gb_aff.ix[idx])

bkgs    1910.707124
Name: (2508b863a1a4, bcab9d6ec630), dtype: float64
('6cc5f0e8c96b', 'b7d0dbd38376')
bkgs    1374.924684
Name: (6cc5f0e8c96b, b7d0dbd38376), dtype: float64
('188e238326e4', '446bb566f202')
bkgs    1206.589522
Name: (188e238326e4, 446bb566f202), dtype: float64
('188e238326e4', 'dbe759c691eb')
bkgs    1203.979908
Name: (188e238326e4, dbe759c691eb), dtype: float64
('6cc5f0e8c96b', '0e9013464c4c')
bkgs    1203.53231
Name: (6cc5f0e8c96b, 0e9013464c4c), dtype: float64
相关问题