如何创建包含另一个数据帧的某些行的平均值的python数据帧

时间:2018-02-07 13:27:59

标签: python pandas

我有一个包含一些值的pandas DataFrame:

                        id  pair      value  subdir
taylor_1e3c_1s_56C  taylor  6_13  -0.398716    run1 
taylor_1e3c_1s_56C  taylor  6_13  -0.397820    run2 
taylor_1e3c_1s_56C  taylor  6_13  -0.397310    run3 
taylor_1e3c_1s_56C  taylor  6_13  -0.390520    run4 
taylor_1e3c_1s_56C  taylor  6_13  -0.377390    run5 
taylor_1e3c_1s_56C  taylor  8_11  -0.393604    run1
taylor_1e3c_1s_56C  taylor  8_11  -0.392899    run2
taylor_1e3c_1s_56C  taylor  8_11  -0.392473    run3
taylor_1e3c_1s_56C  taylor  8_11  -0.389959    run4
taylor_1e3c_1s_56C  taylor  8_11  -0.387946    run5

我想要做的是隔离具有相同indexidpair的行,计算value列上的平均值,以及把它全部放在一个新的数据框中。因为我现在已经有效地平均了subdir的所有可能值,所以也应该删除该列。所以输出应该看起来像这样

                        id  pair      value
taylor_1e3c_1s_56C  taylor  6_13  -0.392351
taylor_1e3c_1s_56C  taylor  8_11  -0.391376

我应该怎么做熊猫?

1 个答案:

答案 0 :(得分:3)

使用句法糖 - Series mean和索引以及groupby df = df['value'].groupby([df.index, df['id'], df['pair']]).mean().reset_index(level=[1,2]) print (df) id pair value taylor_1e3c_1s_56C taylor 6_13 -0.392351 taylor_1e3c_1s_56C taylor 8_11 -0.391376

mean

经典解决方案 - 首先aggregate表示索引列,然后reset_index表示列名称和groupby df = df.reset_index().groupby(['index','id','pair'])['value'].mean().reset_index(level=[1,2]) print (df) id pair value index taylor_1e3c_1s_56C taylor 6_13 -0.392351 taylor_1e3c_1s_56C taylor 8_11 -0.391376

print (df.reset_index())
                index      id  pair     value subdir
0  taylor_1e3c_1s_56C  taylor  6_13 -0.398716   run1
1  taylor_1e3c_1s_56C  taylor  6_13 -0.397820   run2
2  taylor_1e3c_1s_56C  taylor  6_13 -0.397310   run3
3  taylor_1e3c_1s_56C  taylor  6_13 -0.390520   run4
4  taylor_1e3c_1s_56C  taylor  6_13 -0.377390   run5
5  taylor_1e3c_1s_56C  taylor  8_11 -0.393604   run1
6  taylor_1e3c_1s_56C  taylor  8_11 -0.392899   run2
7  taylor_1e3c_1s_56C  taylor  8_11 -0.392473   run3
8  taylor_1e3c_1s_56C  taylor  8_11 -0.389959   run4
9  taylor_1e3c_1s_56C  taylor  8_11 -0.387946   run5

详情:

mean

汇总MultiIndex3 levels获得print (df.reset_index().groupby(['index','id','pair'])['value'].mean()) index id pair taylor_1e3c_1s_56C taylor 6_13 -0.392351 8_11 -0.391376 Name: value, dtype: float64

print (df.reset_index()
        .groupby(['index','id','pair'])['value']
        .mean()
        .reset_index(level=[1,2]))
                        id  pair     value
index                                     
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

将第二个蚂蚁第三级转换为列是必要的aggregate

$pdf = new FPDI(); 

$pdf->AddPage();

$pdf->startTransaction(true);

$pdf->Cell(0, 0, 'blah blah blah');

$pdf->rollbackTransaction(true);

$pdf->Output( . time() . '.pdf', 'D');