Python /熊猫:计算1.最小值,2.最大值到最小值的左边,3.最大值,到最小值的右边

时间:2020-09-18 20:59:26

标签: python pandas

这是Python/ Pandas: Finding a left and right max

的延续

我有一个带有数据时间表的数据框。这是一个示例:

idx Q12000      Q22000      Q32000      Q42000      Q12001      Q22001      Q32001     Q42001      Q12002      Q22002      Q32002      Q42002

0   4085280.0   4114911.0   4108089.0   4111713.0   4055699.0   4076430.0   4043219.0  4039370.0   4201158.0   4243119.0   4231823.0   4254681.0
1   21226.0     21566.0     21804.0     22072.0     21924.0     23232.0     22748.0    22258.0     22614.0     22204.0     22500.0     22660.0     
2   96400.0     102000.0    98604.0     97086.0     96354.0     103054.0    97824.0    95958.0     115938.0    123064.0    120406.0    120648.0   
3   23820.0     24116.0     24186.0     23726.0     23504.0     23574.0     23162.0    23078.0     22306.0     22334.0     22152.0     22080.0     
4   7838.0      7906.0      7714.0      7676.0      7480.0      7520.0      7102.0     6722.0      8324.0      8166.0      8208.0      8326.0   

要进行分析,我需要为每一行计算以下值:

  • 最低点:最低点(分钟)
  • nadir_qtr :最低点发生的季度
  • 峰前:最低点之前的最高点
  • pre-peak_qtr :发生峰前的季度
  • 峰后:最低点之后 的最高点
  • 峰后峰值:峰后发生的季度

在上一篇文章的帮助下,我使用了以下帮助器功能:

from io import StringIO
import pandas as pd

def calc_nadir(s):
    assert isinstance(s, pd.Series)
    return s.min()

def calc_nadir_qtr(s):
    return s.argmin()

def calc_pre_peak(s):
    return s[ : s.argmin()].max()

def calc_pre_peak_quarter(s):
    try:
        qtr = s[ : s.argmin()].argmax()
    except:
        qtr = None
    return qtr

def calc_post_peak(s):
    return s[s.argmin() : ].max()

def calc_post_peak_qtr(s):
    return s[s.argmin() : ].argmax() + s.argmin()

nadir = df.apply(lambda x: calc_nadir(x), axis=1).rename('nadir')
nadir_qtr = df.apply(lambda x: calc_nadir_qtr(x), axis=1).rename('nadir_qtr')

pre_peak = df.apply(lambda x: calc_pre_peak(x), axis=1).rename('pre_peak')
pre_peak_qtr = df.apply(lambda x: calc_pre_peak_quarter(x), axis=1).rename('pre_peak_qtr')

post_peak = df.apply(lambda x: calc_post_peak(x), axis=1).rename('post_peak')
post_peak_qtr = df.apply(lambda x: calc_post_peak_qtr(x), axis=1).rename('post_peak_qtr')

results = pd.concat([nadir, nadir_qtr, pre_peak, pre_peak_qtr, 
                     post_peak, post_peak_qtr], axis=1)
print(results)

       nadir  nadir_qtr   pre_peak  pre_peak_qtr  post_peak  post_peak_qtr
0  4039370.0          7  4114911.0           1.0  4254681.0             11
1    21226.0          0        NaN           NaN    23232.0              5
2    95958.0          7   103054.0           5.0   123064.0              9
3    22080.0         11    24186.0           2.0    22080.0             11
4     6722.0          7     7906.0           1.0     8326.0             11

我遇到的麻烦是第二行。将最低点作为第一列是没有意义的,因此我更改了上面的代码以仅在前几列之后获得最低点。

nadir = df.iloc[:,6:].apply(lambda x: calc_nadir(x), axis=1).rename('nadir')
nadir_qtr = df.iloc[:,6:].apply(lambda x: calc_nadir_qtr(x), axis=1).rename('nadir_qtr')

这似乎效果很好。但是我一直在坚持如何使峰前替换NaN的问题。

我尝试遍历行,但是没有运气。仍然让Nans完全相同。

    for index, row in df.iterrows():
        if not row['pre_peak']:
            slice = row['nadir_qtr'][index]
            row['pre_peak'] = row.iloc[1:slice].max(axis=0)

任何建议表示赞赏

1 个答案:

答案 0 :(得分:1)

您可以使用.min仅在第一列之后进行选择,并使用一堆熊猫方法,例如.maxidxminidxmaxdf['nadir'] = df.iloc[:,1:].min(axis=1) df['nadir_qtr'] = df.iloc[:,1:].idxmin(axis=1).apply(lambda x: df.columns.get_loc(x)) df['new'] = [df.iloc[i].values for i in df.index] df['pre_peak'] = df.apply(lambda x: max(x['new'][0:x['nadir_qtr']]), axis=1) df['post_peak'] = df.apply(lambda x: max(x['new'][x['nadir_qtr']:]), axis=1) df['pre_peak_qtr'] = pd.Series([s[i] for i, s in zip(df.index, df['pre_peak'].apply( lambda x: [i for i in (df.iloc[:,0:-6] == x) .idxmax(axis=1)]))]).apply(lambda x: df.columns.get_loc(x)) df['post_peak_qtr'] = pd.Series([s[i] for i, s in zip(df.index, df['post_peak'].apply( lambda x: [i for i in (df.iloc[:,0:-6] == x) .idxmax(axis=1)]))]).apply(lambda x: df.columns.get_loc(x)) df_new = df[['nadir', 'nadir_qtr', 'pre_peak', 'pre_peak_qtr', 'post_peak', 'post_peak_qtr']] df_new Out[1]: nadir nadir_qtr pre_peak pre_peak_qtr post_peak post_peak_qtr idx 0 4039370.0 7 4114911.0 1 4254681.0 11 1 21566.0 1 21226.0 0 23232.0 5 2 95958.0 7 103054.0 5 123064.0 9 3 22080.0 11 24186.0 2 22080.0 11 4 6722.0 7 7906.0 1 8326.0 11 和其他:

{{1}}