我想基于其得分列中的值从DataFrame生成扇区/组智能对。
+---------+-------------------+---------+
| Ticker | Sector | Score |
+---------+-------------------+---------+
| ABC | Energy | 3.5 |
| XYZ | Energy | 4.5 |
| PQR | Tech | 5.5 |
| MNP | Tech | 1.5 |
| JKL | Energy | 10.5 |
| BCA | Energy | 8.5 |
| RDB | Tech | 6.5 |
| JMP | Tech | 2.5 |
+---------+-------------------+---------+
从上面的例子中,能量JKL / ABC将是一个这样的配对,因为JKL最高,ABC是该部门中的最低得分者。类似,能量中的下一个配对将是BCA / XYZ,因为BCA是第二高,XYZ是第二该领域内最低。
作为下一步,我希望在每个扇区中保留这些对,其中对差大于某个阈值。
感谢您的帮助。
输出可以是
+---------+-------------------+---------+
| Ticker | Sector | Result |
+---------+-------------------+---------+
| ABC | Energy | 0 |
| XYZ | Energy | 0 |
| PQR | Tech | 1 |
| MNP | Tech | 0 |
| JKL | Energy | 1 |
| BCA | Energy | 1 |
| RDB | Tech | 1 |
| JMP | Tech | 0 |
+---------+-------------------+---------+
答案 0 :(得分:1)
这就是你想要的吗?
(
df.groupby('Sector')
.apply(lambda x: [df.Ticker.iloc[x.Score.idxmin()],
df.Ticker.iloc[x.Score.idxmax()],
x.Score.idxmin(), x.Score.idxmax()])
.apply(pd.Series)
.set_axis(['Low Ticker', 'High Ticker', 'Low', 'High'],
axis=1, inplace=False)
.assign(Diff = lambda x: x.High-x.Low)
)
Out[653]:
Low Ticker High Ticker Low High Diff
Sector
Energy ABC JKL 0 4 4
Utilities MNP RDB 3 6 3
然后,您可以通过过滤Diff列来保留对中差异大于某个阈值的每个扇区内的那些对。
答案 1 :(得分:0)
这就是我要做的事情
df=df.sort_values('Score')
df=df.assign(New=df.groupby('Sector').cumcount()%2)
df=df.assign(New2=(df.groupby('Sector').New.apply(lambda x :x.cumsum().replace(0,len(x)/2))))
df.groupby(['Sector','New2']).Ticker.apply(list)
Out[1464]:
Sector New2
Energy 1 [XYZ, BCA]
2 [ABC, JKL]
Utilities 1 [JMP, PQR]
2 [MNP, RDB]
Name: Ticker, dtype: object
然后
df['Result']=(df.Score==df.groupby(['Sector','New2']).Score.transform('max')).astype(int)
df.sort_index()
Out[1471]:
Ticker Sector Score New New2 Result
0 ABC Energy 3.5 0 2 0
1 XYZ Energy 4.5 1 1 0
2 PQR Utilities 5.5 0 1 1
3 MNP Utilities 1.5 0 2 0
4 JKL Energy 10.5 1 2 1
5 BCA Energy 8.5 0 1 1
6 RDB Utilities 6.5 1 2 1
7 JMP Utilities 2.5 1 1 0
修改:根据操作添加diff
df['DIFF']=df.groupby(['Sector','New2']).Score.apply(lambda x : x.diff().bfill())
df.sort_index()
Out[1479]:
Ticker Sector Score New New2 Result DIFF
0 ABC Energy 3.5 0 2 0 7.0
1 XYZ Energy 4.5 1 1 0 4.0
2 PQR Utilities 5.5 0 1 1 3.0
3 MNP Utilities 1.5 0 2 0 5.0
4 JKL Energy 10.5 1 2 1 7.0
5 BCA Energy 8.5 0 1 1 4.0
6 RDB Utilities 6.5 1 2 1 5.0
7 JMP Utilities 2.5 1 1 0 3.0