在另一个多索引pandas数据帧的多索引pandas数据帧中添加额外条目

时间:2016-11-20 16:17:32

标签: python pandas multi-index

我有一个多索引的pandas数据框,我使用了groupby方法,然后使用describe方法给出了以下内容:

    grouped= self.HK_data.groupby(level=[0,1])
    summary= grouped.describe()

给出:

Antibody        Time                
Customer_Col1A2 0    count  3.000000
                     mean   0.757589
                     std    0.188750
                     min    0.639933
                     25%    0.648732
                     50%    0.657532
                     75%    0.816417
                     max    0.975302
                10   count  3.000000
                     mean   0.716279
                     std    0.061939
                     min    0.665601
                     25%    0.681757
                     50%    0.697913
                     75%    0.741618
                     max    0.785324
                     ...   .........

我使用

计算了SEM
    SEM=grouped.mean()/(numpy.sqrt(grouped.count()))

,并提供:

Antibody                 Time          
Customer_Col1A2          0     0.437394
                         10    0.413544
                         120   0.553361
                         180   0.502792
                         20    0.512797
                         240   0.514609
                         30    0.505618
                         300   0.481021
                         45    0.534658
                         5     0.425800
                         60    0.430633
                         90    0.525115
                         ...  .........

我如何concat这两个框架使SEM成为摘要统计的另一个条目?

类似于:

Antibody        Time                
Customer_Col1A2 0    count  3.000000
                     mean   0.757589
                     std    0.188750
                     min    0.639933
                     25%    0.648732
                     50%    0.657532
                     75%    0.816417
                     max    0.975302
                     SEM    0.437394
                10   count  3.000000
                     mean   0.716279
                     std    0.061939
                     min    0.665601
                     25%    0.681757
                     50%    0.697913
                     75%    0.741618
                     max    0.785324
                     SEM    0.413544

我已经尝试pandas.concat,但这并没有给我我想要的东西。

谢谢!

1 个答案:

答案 0 :(得分:2)

我认为您首先添加第三级MultiIndex,按MultiIndex.from_tuples分配新索引,最后一次使用concat sort_index

HK_data = pd.DataFrame({'Antibody':['Customer_Col1A2','Customer_Col1A2','Customer_Col1A2'],
                   'Time':[0,10,10],
                   'Col':[7,8,9]})
HK_data = HK_data.set_index(['Antibody','Time'])
print (HK_data)
                      Col
Antibody        Time     
Customer_Col1A2 0       7
                10      8
                10      9
grouped= HK_data.groupby(level=[0,1])
summary= grouped.describe()
print (summary)
                                 Col
Antibody        Time                
Customer_Col1A2 0    count  1.000000
                     mean   7.000000
                     std         NaN
                     min    7.000000
                     25%    7.000000
                     50%    7.000000
                     75%    7.000000
                     max    7.000000
                10   count  2.000000
                     mean   8.500000
                     std    0.707107
                     min    8.000000
                     25%    8.250000
                     50%    8.500000
                     75%    8.750000
                     max    9.000000

SEM=grouped.mean()/(np.sqrt(grouped.count()))
#change multiindex
new_index = list(zip(SEM.index.get_level_values('Antibody'),
                     SEM.index.get_level_values('Time'), 
                     ['SEM'] * len(SEM.index)))
SEM.index = pd.MultiIndex.from_tuples(new_index, names=('Antibody','Time', None))

print (SEM)
                               Col
Antibody        Time              
Customer_Col1A2 0    SEM  7.000000
                10   SEM  6.010408
df = pd.concat([summary, SEM]).sort_index()
print (df)
                                 Col
Antibody        Time                
Customer_Col1A2 0    25%    7.000000
                     50%    7.000000
                     75%    7.000000
                     SEM    7.000000
                     count  1.000000
                     max    7.000000
                     mean   7.000000
                     min    7.000000
                     std         NaN
                10   25%    8.250000
                     50%    8.500000
                     75%    8.750000
                     SEM    6.010408
                     count  2.000000
                     max    9.000000
                     mean   8.500000
                     min    8.000000
                     std    0.707107