根据条件合并和转换pd.df

时间:2017-10-11 08:27:55

标签: pandas boolean python-3.6

我正在寻找巩固以下Dataframe如下。 DF

       Expiry           F      K Type     sigma
0  2017-10-27  125.109375  123.5    P  0.045410
1  2017-10-27  125.109375  127.5    P  0.047965
2  2017-10-27  125.109375  124.5    P  0.041822
3  2017-10-27  125.109375  125.5    P  0.041526
4  2017-10-27  125.109375  120.5    P  0.045410
5  2017-10-27  125.109375  121.5    P  0.045410
6  2017-10-27  125.109375  121.0    P  0.045410
7  2017-10-27  125.109375  122.0    P  0.045410
8  2017-10-27  125.109375  123.0    P  0.045410
9  2017-10-27  125.109375  124.0    P  0.043341
10 2017-10-27  125.109375  125.0    P  0.041143
11 2017-10-27  125.109375  126.0    P  0.043123
12 2017-10-27  125.109375  127.0    P  0.047965
13 2017-10-27  125.109375  128.0    P  0.047965
14 2017-10-27  125.109375  128.5    P  0.047965
15 2017-10-27  125.109375  129.0    P  0.047965
16 2017-10-27  125.109375  129.5    P  0.047965
17 2017-10-27  125.109375  130.0    P  0.047965
18 2017-10-27  125.109375  126.5    P  0.046020
19 2017-10-27  125.109375  122.5    P  0.045410
20 2017-10-27  125.109375  123.5    C  0.045410
21 2017-10-27  125.109375  127.5    C  0.047965
22 2017-10-27  125.109375  124.5    C  0.041822
23 2017-10-27  125.125000  125.5    C  0.041629
24 2017-10-27  125.125000  120.5    C  0.045487
25 2017-10-27  125.125000  121.5    C  0.045487
26 2017-10-27  125.125000  121.0    C  0.045487
27 2017-10-27  125.125000  122.0    C  0.045487
28 2017-10-27  125.125000  123.0    C  0.045487
29 2017-10-27  125.125000  124.0    C  0.043292
..        ...         ...    ...  ...       ...
70 2017-11-03  125.109375  125.0    C  0.040830
71 2017-11-03  125.109375  126.0    C  0.042517
72 2017-11-03  125.109375  127.0    C  0.046631
73 2017-11-03  125.109375  128.0    C  0.046631
74 2017-11-03  125.109375  128.5    C  0.046631
75 2017-11-03  125.109375  129.0    C  0.046631
76 2017-11-03  125.109375  129.5    C  0.046631
77 2017-11-03  125.109375  130.0    C  0.046631
78 2017-11-03  125.109375  126.5    C  0.044948
79 2017-11-03  125.109375  122.5    C  0.044366
80 2017-10-20  125.109375  123.5    P  0.046512
81 2017-10-20  125.109375  127.5    P  0.048400
82 2017-10-20  125.109375  124.5    P  0.041512
83 2017-10-20  125.109375  125.5    P  0.042744
84 2017-10-20  125.109375  120.5    P  0.046512
85 2017-10-20  125.109375  121.5    P  0.046512
86 2017-10-20  125.109375  121.0    P  0.046512
87 2017-10-20  125.109375  122.0    P  0.046512
88 2017-10-20  125.109375  123.0    P  0.046512
89 2017-10-20  125.109375  124.0    P  0.044166
90 2017-10-20  125.109375  125.0    P  0.041220
91 2017-10-20  125.109375  126.0    P  0.045406
92 2017-10-20  125.109375  127.0    P  0.048400
93 2017-10-20  125.109375  128.0    P  0.048400
94 2017-10-20  125.109375  128.5    P  0.048400
95 2017-10-20  125.109375  129.0    P  0.048400
96 2017-10-20  125.109375  129.5    P  0.048400
97 2017-10-20  125.109375  130.0    P  0.048400
98 2017-10-20  125.109375  126.5    P  0.048400
99 2017-10-20  125.109375  122.5    P  0.046512

根据以下条件选择Sigma: 如果F> K使用西格玛来输入Type = P. 否则使用sigma作为type = C. 我正在寻找的结果应该如下所示:

              123  123.5    124  124.5    125  125.5    126  126.5 
Expiry                                                                         
2017-10-20  0.051  0.047  0.043   0.04  0.040  0.039  0.041  0.043
2017-10-27  0.045  0.041  0.041   0.04  0.039  0.039  0.040  0.042
.....

非常感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

# create auxiliary column to check condition option 1
df['to_use'] = df.apply(lambda x: 1 if (x['F'] > x['K'] and x['Type'] == 'P') or x['Type'] == 'C' else 0, axis = 1)
# create auxiliary column to check condition option 2 (fastest option)
df['to_use'] = (df['F'] > df['K'])*1*(df['Type'] == 'P' )*1 + (df['Type'] == 'C' )*1

# sort data to place condition match values on top and P-type higher than C-type
df.sort_values(by = ['Expiry', 'K', 'to_use', 'Type'], ascending = [True, True, False, False], inplace = True)

# leave only values matching to condition (if P-type than it'll be higher otherwise C-type
new_df = df.drop_duplicates(subset = ['Expiry', 'K'], keep = 'first')

# now we are done to present result as a pivot table
new_df.pivot_table(index='Expiry', columns='K', values = 'sigma').reset_index()