优雅的方式迭代到多级pandas DataFrame

时间:2017-01-10 17:56:28

标签: python python-3.x pandas numpy multi-index

我有一个DataFrame:

import numpy as np import pandas as pd

a = pd.DataFrame(np.random.rand(10,6)) cols = [['A', 'B', 'C'],
['AA','BB']] a.columns = pd.MultiIndex.from_product(
    cols,
    names= ['first lvl', 'second level'])

,它给出了一个MultiIndex df,如下所示:

first lvl            A                   B                   C          
second level        AA        BB        AA        BB        AA        BB
0             0.991608  0.469706  0.338347  0.254777  0.739046  0.980094
1             0.039133  0.959985  0.718216  0.746632  0.341260  0.264836
2             0.164068  0.158672  0.175882  0.211732  0.146807  0.678957
3             0.324433  0.343780  0.269040  0.432309  0.469457  0.247455
4             0.932380  0.314262  0.439924  0.037954  0.641936  0.011523
5             0.608288  0.308212  0.680107  0.988747  0.349255  0.775298
6             0.082478  0.859175  0.546415  0.471169  0.013312  0.824054
7             0.244569  0.049261  0.194941  0.350334  0.203621  0.408066
8             0.132751  0.092825  0.237527  0.383277  0.288257  0.764209
9             0.417155  0.578300  0.325731  0.504903  0.718891  0.861813

我希望迭代A,B和C列并执行np.polyfit(AA, BB, deg=1)

除了:

之外,还有一种优雅而简单的方法吗?
cols = np.unique(a.columns.get_level_values(0)) beta =
[np.polyfit(a[col]['AA'], a[col]['BB'], deg= 1) for col in cols]

2 个答案:

答案 0 :(得分:1)

您可以在列中MultiIndex的第一级groupby使用{{3}}并应用自定义功能f

np.random.seed(100)
a = pd.DataFrame(np.random.rand(10,6)) 
cols = [['A', 'B', 'C'],['AA','BB']]
a.columns = pd.MultiIndex.from_product( cols, names= ['first lvl', 'second level'])
print (a)
irst lvl            A                   B                   C          
second level        AA        BB        AA        BB        AA        BB
0             0.543405  0.278369  0.424518  0.844776  0.004719  0.121569
1             0.670749  0.825853  0.136707  0.575093  0.891322  0.209202
2             0.185328  0.108377  0.219697  0.978624  0.811683  0.171941
3             0.816225  0.274074  0.431704  0.940030  0.817649  0.336112
4             0.175410  0.372832  0.005689  0.252426  0.795663  0.015255
5             0.598843  0.603805  0.105148  0.381943  0.036476  0.890412
6             0.980921  0.059942  0.890546  0.576901  0.742480  0.630184
7             0.581842  0.020439  0.210027  0.544685  0.769115  0.250695
8             0.285896  0.852395  0.975006  0.884853  0.359508  0.598859
9             0.354796  0.340190  0.178081  0.237694  0.044862  0.505431

print (a.groupby(level=0, axis=1)
        .apply(lambda x: np.polyfit(x[(x.name, 'AA')],
                                    x[(x.name, 'BB')], deg= 1)))
A    [-0.200103361495, 0.477549562164]
B     [0.415374076332, 0.473118297463]
C     [-0.33785161273, 0.551131295356]
dtype: object

答案 1 :(得分:1)

我建议:

beta = [np.polyfit(a[col]['AA'], a[col]['BB'], deg= 1) for col in a.stack().columns]

它在一行中完成你想要的东西