在分组后找到熊猫python中每个组中最接近的

时间:2018-12-08 07:47:42

标签: python pandas

我是熊猫的新手,并且一直坚持计算。这是我的样本DF。我有兴趣获取每个组最近的收盘价价格的 strike_price 行。 (按日期,时间,option_type分组)

        name      date   time      open      high       low     close       option_type  strike_price  open_opn  high_opn  low_opn  close_opn 
0       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      CE              11800       104.95    109.45   104.50     107.25 
1       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      CE              11700       160.00    164.25   159.90     161.60
2       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      CE              11600       223.20    229.10   223.20     226.30  
0       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      PE              11800       166.05    166.95   163.55     165.95  
1       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      PE              11600       88.80     89.45    86.45      89.00
2       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      PE              11700       122.35    122.65   119.40     122.30

这是我的预期输出。

        name      date   time      open      high       low     close       option_type  strike_price  open_opn  high_opn  low_opn  close_opn 

1       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11700.15      CE              11700       160.00    164.25   159.90     161.60
2       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11700.15      PE              11700       122.35    122.65   119.40     122.30

如果收盘价为11760,则预期o / p为。

0       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      CE              11800       104.95    109.45   104.50     107.25
0       NIFTY  20180903  09:16  11736.05  11736.10  11699.35  11720.15      PE              11800       166.05    166.95   163.55     165.95

请帮助我。高度赞赏。

1 个答案:

答案 0 :(得分:1)

首先将subabs一起使用,然后通过sort_valuesdrop_duplicates来获得每组的最小值:

df['diff'] = df['close'].sub(df['strike_price']).abs()

df = df.sort_values('diff').drop_duplicates(['date', 'time', 'option_type'])
print (df)
    name      date   time      open     high       low     close option_type  \
1  NIFTY  20180903  09:16  11736.05  11736.1  11699.35  11720.15          CE   
2  NIFTY  20180903  09:16  11736.05  11736.1  11699.35  11720.15          PE   

   strike_price  open_opn  high_opn  low_opn  close_opn   diff  
1         11700    160.00    164.25    159.9      161.6  20.15  
2         11700    122.35    122.65    119.4      122.3  20.15  

DataFrameGroupBy.idxminloc结合使用的另一种解决方案:

df = df.reset_index(drop=True)
df['diff'] = df['close'].sub(df['strike_price']).abs()

df = df.loc[df.groupby(['date', 'time', 'option_type'])['diff'].idxmin()]
print (df)

    name      date   time      open     high       low     close option_type  \
1  NIFTY  20180903  09:16  11736.05  11736.1  11699.35  11720.15          CE   
5  NIFTY  20180903  09:16  11736.05  11736.1  11699.35  11720.15          PE   

   strike_price  open_opn  high_opn  low_opn  close_opn   diff  
1         11700    160.00    164.25    159.9      161.6  20.15  
5         11700    122.35    122.65    119.4      122.3  20.15