过滤数据框,其中一列在增加之后按列增加而另一列在减少

时间:2019-06-25 08:34:16

标签: pandas pandas-groupby

我想过滤一个数据帧,其中一列正在增加而另一列正在减少。

>>> import pandas as pd
>>> df = pd.DataFrame({'A1': ['one', 'one', 'two', 'three', 'three', 'one', 'two'], 'A2': ['one1', 'one2', 'two1', 'three1', 'three2', 'one2', 'two2'], 'B': [1, 2, 0, 4, 3, 4, 5], 'C': [2, 1, 3, 3, 4, 8, 5]}) 
>>> df                                                                                                                                                                                                                A1      A2  B  C
0    one    one1  1  2
1    one    one2  2  1
2    two    two1  0  3
3  three  three1  4  3
4  three  three2  3  4
5    one    one2  4  8
6    two    two2  5  5
>>> aggregator = {'B': {'sB': 'sum', 'cB': 'count'}, 'C': {'sC' : 'sum','cC':'count'}}
>>> df1 = df.groupby(["A1", "A2"]).agg(aggregator)
>>> df1
              C     B   
             sC cC sB cB
A1    A2                
one   one1    2  1  1  1
      one2    9  2  6  2
three three1  3  1  4  1
      three2  4  1  3  1
two   two1    3  1  0  1
      two2    5  1  5  1
>>> df2 = df.groupby("A1").agg(aggregator)
>>> df2
        C     B   
       sC cC sB cB
A1                
one    11  3  7  3
three   7  2  7  2
two     8  2  5  2
>>> df3 = df1.div(df2, level="A1")*100                                                                                                                                                                         
>>> df3
                      C                      B           
                     sC         cC          sB         cB
A1    A2                                                 
one   one1    18.181818  33.333333   14.285714  33.333333
      one2    81.818182  66.666667   85.714286  66.666667
three three1  42.857143  50.000000   57.142857  50.000000
      three2  57.142857  50.000000   42.857143  50.000000
two   two1    37.500000  50.000000    0.000000  50.000000
      two2    62.500000  50.000000  100.000000  50.000000
>>> 

现在在上面的df3中,我希望sC在增加但sB在减少的组。需要明确的是,sC组的one是18和81,而sB的{​​{1}}是14和85(在two组中也观察到类似的模式),而对于three组,sC在增加(从42到57),而sB在减少(从57到42)。我想过滤数据,以便仅获得组three的赞。

因此,预期输出为-

>>> df3
                      C                      B           
                     sC         cC          sB         cB
A1    A2                                                 
three three1  42.857143  50.000000   57.142857  50.000000
      three2  57.142857  50.000000   42.857143  50.000000

请帮助。

0 个答案:

没有答案