将两个pandas数据帧与多级索引相结合

时间:2015-05-22 12:39:28

标签: pandas

以下是原始数据框

int

之后我就像这样分组

    Week_No item_Number     Inside__Outside
4   1.2014  3164018114707537    INSIDE
6   1.2014  50010EJ654990       INSIDE
19  1.2014  304400JE130142      INSIDE
29  1.2014  3164018114725810    INSIDE
31  1.2014  3164018114711298    INSIDE
35  1.2014  3164018114707546    OUTSIDE
36  1.2014  3164018114711299    OUTSIDE
41  1.2014  3164018114727381    INSIDE
54  1.2014  50010EJ655470       OUTSIDE
145 1.2014  304400TS135379      INSIDE

之后是一个组合数据帧

df = df.groupby(['Week_No','Inside__Outside']).agg(['count'])

现在有两个数据帧

                            item_Number
                               count
Week_No   Inside__Outside   
1.2014          INSIDE          51
                OUTSIDE         8
2.2014          INSIDE          91
                OUTSIDE         16
3.2014          INSIDE          92
                OUTSIDE         7
4.2014          INSIDE          76
                OUTSIDE         5

并且

df1                                 
                             item_Number
                                 count
Week_No     Inside__Outside     
1.2015      INSIDE                18
2.2015      INSIDE                48
3.2015      INSIDE                87
4.2015      INSIDE                54
5.2015      INSIDE                61
6.2015      INSIDE                46
7.2015      INSIDE                83
8.2015      INSIDE                41
9.2015      INSIDE                34

现在我想根据周总结。即两个数据帧的输出

df2                                 
                                 item_Number
                                     count
    Week_No     Inside__Outside     
    1.2015      OUTSIDE                   8
    2.2015      OUTSIDE                   4
    3.2015      OUTSIDE                   7
    4.2015      OUTSIDE                   4
    5.2015      OUTSIDE                   1
    6.2015      OUTSIDE                   6
    7.2015      OUTSIDE                   8
    8.2015      OUTSIDE                   4
    9.2015      OUTSIDE                   3

我想先选择数据,然后手动添加它们,但这似乎并不高效。此外,由于这是多级索引,我无法根据Week_no选择数据。另外请不要查看计数列中的绝对数字。我的问题是针对多级索引数据框的操作。

2 个答案:

答案 0 :(得分:0)

您必须从索引中删除Inside__Outside列,因为您没有使用它来加入这两个表。

让我们从您在示例中提供的两个数据框开始:

data_1_df
Out[35]: 
                         item_Number count
Week_No Inside__Outside                   
1.2015  INSIDE                          18
2.2015  INSIDE                          48
3.2015  INSIDE                          87
4.2015  INSIDE                          54
5.2015  INSIDE                          61
6.2015  INSIDE                          46
7.2015  INSIDE                          83
8.2015  INSIDE                          41
9.2015  INSIDE                          34

data_2_df
Out[36]: 
                         item_Number count
Week_No Inside__Outside                   
1.2015  OUTSIDE                          8
2.2015  OUTSIDE                          4
3.2015  OUTSIDE                          7
4.2015  OUTSIDE                          4
5.2015  OUTSIDE                          1
6.2015  OUTSIDE                          6
7.2015  OUTSIDE                          8
8.2015  OUTSIDE                          4
9.2015  OUTSIDE                          3

您可以将它们叠加在另一个上,Week_No上的组和item_Number count上的总和:

data_3_df = (
    pd.concat([data_1_df, data_2_df])
    .reset_index()
    .groupby('Week_No')
    .agg({'item_Number count': sum}
)

这为INSIDEOUTSIDE

提供了每周的总和
data_3_df
Out[52]: 
         item_Number count
Week_No                   
1.2015                  26
2.2015                  52
3.2015                  94
4.2015                  58
5.2015                  62
6.2015                  52
7.2015                  91
8.2015                  45
9.2015                  37

答案 1 :(得分:0)

只需将它们附加在一起并按第一级分组即可 -

In [118]: df1
Out[118]: 
                        item_Number
                              count
Week_No Inside__Outside            
1.2015  INSIDE                   18
2.2015  INSIDE                   48
3.2015  INSIDE                   87
4.2015  INSIDE                   54
5.2015  INSIDE                   61
6.2015  INSIDE                   46
7.2015  INSIDE                   83
8.2015  INSIDE                   41
9.2015  INSIDE                   34

In [119]: df2
Out[119]: 
                        item_Number
                              count
Week_No Inside__Outside            
1.2015  OUTSIDE                   8
2.2015  OUTSIDE                   4
3.2015  OUTSIDE                   7
4.2015  OUTSIDE                   4
5.2015  OUTSIDE                   1
6.2015  OUTSIDE                   6
7.2015  OUTSIDE                   8
8.2015  OUTSIDE                   4
9.2015  OUTSIDE                   3

In [120]: df1.append(df2).groupby(level=0).sum()
Out[120]: 
        item_Number
              count
Week_No            
1.2015           26
2.2015           52
3.2015           94
4.2015           58
5.2015           62
6.2015           52
7.2015           91
8.2015           45
9.2015           37