python数据框基于一组独立列的结果返回最高和最低值

时间:2020-07-02 11:54:19

标签: python pandas dataframe

说我有以下数据框。

import pandas as pd
df = pd.DataFrame()

df['close'] = (7980,7996,8855,8363,8283,8303,8266,8582,8586,8179,8206,7854,8145,8152,8240,8373,8319,8298,8048,8218,8188,8055,8432,8537,9682,10021,9985,10169,10272,10152,10196,10270,10306,10355,10969,10420,10154,10096,10307,10400,10484)

df['A'] = ('TDOWN','TDOWN', 'TDOWN', 'TOP', 'TOP', 'TOP', 'TOP', 'TOP','BUP','BUP','BUP', 'BUP', 'BUP', 'BOTTOM', 'BOTTOM', 'BOTTOM', 'BUP','BUP','BUP','BUP', 'BOTTOM', 'BOTTOM', 'BUP','BUP','BUP', 'BUP','BUP','BUP','BUP', 'BOTTOM', 'BOTTOM', 'BOTTOM', 'BOTTOM','TDOWN','TDOWN', 'TDOWN', 'TOP', 'TOP', 'TOP', 'TOP', 'TOP')

print(df)

对于每组“ TOP”和“ BOTTOM”的收益,我想返回一组“ TOP”的最高数字,而返回一组“ BOTTOM”的最低数字。以下是我想要实现的理想结果

df['outcome1'] = ('-','-', '-', '-', '-', '-', '-', '8582','-','-','-', '-', '-', '8152', '-', '-', '-','-','-','-', '-', '8055', '-','-','-', '-','-','-','-', '10152', '-', '-', '-','-','-', '-', '-', '-', '-', '-', '10848')

您会注意到,“结果1”列中的数字在A列中显示了一些对应的数字。这些数字是“顶部”组中数字最高的部分,是“底部”组中最低的数字。

我该如何编码,这样我才能反映出与“结果1”列相同的结果。

谢谢

3 个答案:

答案 0 :(得分:3)

我们可以通过以下操作来实现:

  1. 首先,我们创建TOPBOTTOM的组
  2. 然后我们每组分别获得maxmin
  3. 我们通过fillna合并最大值和最小值
grps =  (~df['A'].isin(['TOP', 'BOTTOM'])).cumsum()
top = df.where(df['A'].eq('TOP')).groupby(grps)['close'].transform('max')
bottom = df.where(df['A'].eq('BOTTOM')).groupby(grps)['close'].transform('min')
values = top.fillna(bottom)

df['outcome1'] = values.where(values.eq(df['close']), '-')

    close       A outcome1
0    7980   TDOWN        -
1    7996   TDOWN        -
2    8855   TDOWN        -
3    8363     TOP        -
4    8283     TOP        -
5    8303     TOP        -
6    8266     TOP        -
7    8582     TOP     8582
8    8586     BUP        -
9    8179     BUP        -
10   8206     BUP        -
11   7854     BUP        -
12   8145     BUP        -
13   8152  BOTTOM     8152
14   8240  BOTTOM        -
15   8373  BOTTOM        -
16   8319     BUP        -
17   8298     BUP        -
18   8048     BUP        -
19   8218     BUP        -
20   8188  BOTTOM        -
21   8055  BOTTOM     8055
22   8432     BUP        -
23   8537     BUP        -
24   9682     BUP        -
25  10021     BUP        -
26   9985     BUP        -
27  10169     BUP        -
28  10272     BUP        -
29  10152  BOTTOM    10152
30  10196  BOTTOM        -
31  10270  BOTTOM        -
32  10306  BOTTOM        -
33  10355   TDOWN        -
34  10969   TDOWN        -
35  10420   TDOWN        -
36  10154     TOP        -
37  10096     TOP        -
38  10307     TOP        -
39  10400     TOP        -
40  10484     TOP    10484

答案 1 :(得分:1)

使用:

g = df['A'].ne(df['A'].shift()).cumsum()

df1 = (df.groupby(['A', g])['close']
         .agg(['idxmax','idxmin'])
         .stack()
         .reset_index(level=1, drop=True)
         .reset_index(name='idx'))

df1['mask'] = df1['A'].eq('BOTTOM') & df1['level_1'].eq('idxmin') | 
              df1['A'].eq('TOP') & df1['level_1'].eq('idxmax')
print (df1)

mask = df.index.isin(df1.loc[df1['mask'], 'idx'])

df['new'] = np.where(mask, df['close'], '-')

答案 2 :(得分:0)

替代答案:

# Import libraries
import pandas as pd

# Create DataFrame
df = pd.DataFrame()
df['close'] = (7980,7996,8855,8363,8283,8303,8266,8582,8586,8179,8206,7854,8145,8152,8240,8373,8319,8298,8048,8218,8188,8055,8432,8537,9682,10021,9985,10169,10272,10152,10196,10270,10306,10355,10969,10420,10154,10096,10307,10400,10484)
df['A'] = ('TDOWN','TDOWN', 'TDOWN', 'TOP', 'TOP', 'TOP', 'TOP', 'TOP','BUP','BUP','BUP', 'BUP', 'BUP', 'BOTTOM', 'BOTTOM', 'BOTTOM', 'BUP','BUP','BUP','BUP', 'BOTTOM', 'BOTTOM', 'BUP','BUP','BUP', 'BUP','BUP','BUP','BUP', 'BOTTOM', 'BOTTOM', 'BOTTOM', 'BOTTOM','TDOWN','TDOWN', 'TDOWN', 'TOP', 'TOP', 'TOP', 'TOP', 'TOP')

# Create flags for groups
c = 0
df['flag'] = np.nan
for i in range(df.shape[0]-1):
    if(df['A'].iloc[i]==df['A'].iloc[i+1]):
        df['flag'].iloc[i+1] = c
    else:
        c += 1
        df['flag'].iloc[i+1] = c
        
# Create grouped object
g = df.groupby(['flag'], as_index=False)


# Get Highest and Lowest
g_max = g.max()
g_max = g_max[g_max['A']=='TOP']
    
g_min = g.min()
g_min = g_min[g_min['A']=='BOTTOM']

# Combine Highest and lowest
dfg = pd.concat([g_max, g_min])
dfg = dfg.drop('flag', axis=1)
dfg['outcome1'] = dfg['close']
dfg

# Merge with original DataFrame
dfnew = df.merge(dfg, on=['A','close'], how='left').fillna('-')

输出

dfnew

    close       A flag outcome1
0    7980   TDOWN    -        -
1    7996   TDOWN    0        -
2    8855   TDOWN    0        -
3    8363     TOP    1        -
4    8283     TOP    1        -
5    8303     TOP    1        -
6    8266     TOP    1        -
7    8582     TOP    1     8582
8    8586     BUP    2        -
9    8179     BUP    2        -
10   8206     BUP    2        -
11   7854     BUP    2        -
12   8145     BUP    2        -
13   8152  BOTTOM    3     8152
14   8240  BOTTOM    3        -
15   8373  BOTTOM    3        -
16   8319     BUP    4        -
17   8298     BUP    4        -
18   8048     BUP    4        -
19   8218     BUP    4        -
20   8188  BOTTOM    5        -
21   8055  BOTTOM    5     8055
22   8432     BUP    6        -
23   8537     BUP    6        -
24   9682     BUP    6        -
25  10021     BUP    6        -
26   9985     BUP    6        -
27  10169     BUP    6        -
28  10272     BUP    6        -
29  10152  BOTTOM    7    10152
30  10196  BOTTOM    7        -
31  10270  BOTTOM    7        -
32  10306  BOTTOM    7        -
33  10355   TDOWN    8        -
34  10969   TDOWN    8        -
35  10420   TDOWN    8        -
36  10154     TOP    9        -
37  10096     TOP    9        -
38  10307     TOP    9        -
39  10400     TOP    9        -
40  10484     TOP    9    10484