我想过滤一个数据帧,其中一列正在增加而另一列正在减少。
>>> import pandas as pd
>>> df = pd.DataFrame({'A1': ['one', 'one', 'two', 'three', 'three', 'one', 'two'], 'A2': ['one1', 'one2', 'two1', 'three1', 'three2', 'one2', 'two2'], 'B': [1, 2, 0, 4, 3, 4, 5], 'C': [2, 1, 3, 3, 4, 8, 5]})
>>> df A1 A2 B C
0 one one1 1 2
1 one one2 2 1
2 two two1 0 3
3 three three1 4 3
4 three three2 3 4
5 one one2 4 8
6 two two2 5 5
>>> aggregator = {'B': {'sB': 'sum', 'cB': 'count'}, 'C': {'sC' : 'sum','cC':'count'}}
>>> df1 = df.groupby(["A1", "A2"]).agg(aggregator)
>>> df1
C B
sC cC sB cB
A1 A2
one one1 2 1 1 1
one2 9 2 6 2
three three1 3 1 4 1
three2 4 1 3 1
two two1 3 1 0 1
two2 5 1 5 1
>>> df2 = df.groupby("A1").agg(aggregator)
>>> df2
C B
sC cC sB cB
A1
one 11 3 7 3
three 7 2 7 2
two 8 2 5 2
>>> df3 = df1.div(df2, level="A1")*100
>>> df3
C B
sC cC sB cB
A1 A2
one one1 18.181818 33.333333 14.285714 33.333333
one2 81.818182 66.666667 85.714286 66.666667
three three1 42.857143 50.000000 57.142857 50.000000
three2 57.142857 50.000000 42.857143 50.000000
two two1 37.500000 50.000000 0.000000 50.000000
two2 62.500000 50.000000 100.000000 50.000000
>>>
现在在上面的df3
中,我希望sC
在增加但sB
在减少的组。需要明确的是,sC
组的one
是18和81,而sB
的{{1}}是14和85(在two
组中也观察到类似的模式),而对于three
组,sC
在增加(从42到57),而sB
在减少(从57到42)。我想过滤数据,以便仅获得组three
的赞。
因此,预期输出为-
>>> df3
C B
sC cC sB cB
A1 A2
three three1 42.857143 50.000000 57.142857 50.000000
three2 57.142857 50.000000 42.857143 50.000000
请帮助。