使用NumPy查看结果趋势

时间:2018-08-25 09:16:07

标签: python python-3.x pandas numpy group-by

我有一个如下所述的数据框:

PROCESS_NO  PROCESS_NAME     RESULT_2    RESULT_3 
10254       AAA              4.40        46.67 
10254       AAA              4.45        48.33 
10254       AAA              4.50        50.00 
10254       AAA              4.45        48.33 
10254       AAA              4.50        50.00 
10255       BBB              4.50        50.00 
10255       BBB              4.50        50.00 
10254       AAA              4.45        48.33 
10254       AAA              4.45        48.33 
10254       AAA              4.45        48.33 
10255       BBB              4.50        51.60 
10255       BBB              4.50        52.80 
10255       BBB              4.50        56.80 
10255       BBB              4.50        51.70 
10255       BBB              4.46        57.90 
10255       BBB              4.44        52.00 

我想检查对应的RESULT_2,RESULT_3值是否等于或大于前3行的值,然后在另一列按PROCESS_NO,PROCESS NAME 分组的字段中为True或false。

我想要这样的结果数据框。

PROCESS_NO  PROCESS NAME    RESULT_2    CHECK_2 RESULT_3    CHECK_2
10254       AAA             4.40        FALSE   46.67       FALSE 
10254       AAA             4.45        FALSE   48.33       FALSE 
10254       AAA             4.45        TRUE    48.33       TRUE
10254       AAA             4.45        TRUE    48.33       TRUE
10254       AAA             4.45        TRUE    48.33       TRUE
10254       AAA             4.50        TRUE    50.00       TRUE
10254       AAA             4.45        FALSE   48.33       FALSE
10254       AAA             4.50        TRUE    50.00       TRUE
10255       BBB             4.50        FALSE   50.00       FALSE
10255       BBB             4.50        FALSE   50.00       FALSE
10255       BBB             4.50        TRUE    51.60       TRUE
10255       BBB             4.50        TRUE    52.80       TRUE
10255       BBB             4.50        TRUE    56.80       TRUE
10255       BBB             4.50        TRUE    51.70       FALSE
10255       BBB             4.46        FALSE   57.90       TRUE
10255       BBB             4.44        FALSE   52.00       FALSE

2 个答案:

答案 0 :(得分:5)

不使用Numpy并以最简单的方式进行操作:

import pandas as pd

data = [[10254,'AAA',4.40,46.67],
        [10255,'BBB',4.50,50.00],
        [10255,'BBB',4.50,50.00],
        [10254,'AAA',4.45,48.33],
        [10254,'AAA',4.50,50.00],
        [10254,'AAA',1.50,10.00],]
dataframe = pd.DataFrame(data, columns=['PROCESS_NO','PROCESS NAME','RESULT_2','RESULT_3'])
dataframe['CHECK_2'] = 'FALSE'
dataframe['CHECK_3'] = 'FALSE'
check2_position = dataframe.columns.get_loc('CHECK_2')
check3_position = dataframe.columns.get_loc('CHECK_3')
for i in range(0,len(dataframe)):
    if i >= 3 :
        current_result2 = dataframe.iloc[i]['RESULT_2'];
        if(current_result2 >= dataframe.iloc[i-1]['RESULT_2'] or
           current_result2 >= dataframe.iloc[i-2]['RESULT_2'] or
           current_result2 >= dataframe.iloc[i-3]['RESULT_2'] ):
            dataframe.iat[i,check2_position] = 'TRUE'

        current_result3 = dataframe.iloc[i]['RESULT_3'];
        if(current_result3 >= dataframe.iloc[i-1]['RESULT_3'] or
           current_result3 >= dataframe.iloc[i-2]['RESULT_3'] or
           current_result3 >= dataframe.iloc[i-3]['RESULT_3'] ):
            dataframe.iat[i,check3_position] = 'TRUE'


print(dataframe)

结果随心所欲:

   PROCESS_NO PROCESS NAME  RESULT_2  RESULT_3 CHECK_2 CHECK_3
0       10254          AAA      4.40     46.67   FALSE   FALSE
1       10255          BBB      4.50     50.00   FALSE   FALSE
2       10255          BBB      4.50     50.00   FALSE   FALSE
3       10254          AAA      4.45     48.33    TRUE    TRUE
4       10254          AAA      4.50     50.00    TRUE    TRUE
5       10254          AAA      1.50     10.00   FALSE   FALSE

希望对您有所帮助。

干杯。

答案 1 :(得分:2)

尝试一下。

def greater_than(df_grp):
    df_grp['CHECK_2'] = df['RESULT_2'].rolling(3).apply(lambda x: all(x[2] >= i for i in x[:1]))
    df_grp['CHECK_3'] = df['RESULT_3'].rolling(3).apply(lambda x: all(x[2] >= i for i in x[:1]))
    df_grp[['CHECK_2','CHECK_3']] = df_grp[['CHECK_2','CHECK_3']].fillna(0).astype(int)
    return df_grp

result = df.groupby(['PROCESS_NO', 'PROCESS NAME']).apply(greater_than)
print(result)

输出

    PROCESS_NO PROCESS NAME  RESULT_2  RESULT_3  CHECK_2  CHECK_3
0        10254          AAA      4.40     46.67        0        0
1        10255          BBB      4.50     50.00        0        0
2        10255          BBB      4.50     50.00        1        1
3        10254          AAA      4.45     48.33        0        0
4        10254          AAA      4.45     48.33        0        0
5        10254          AAA      4.45     48.33        1        1
6        10254          AAA      4.45     48.33        1        1
7        10254          AAA      4.50     50.00        1        1
8        10255          BBB      4.50     51.60        1        1
9        10255          BBB      4.50     52.80        1        1
10       10255          BBB      4.50     56.80        1        1
11       10255          BBB      4.50     51.70        1        0
12       10255          BBB      4.50     57.90        1        1
13       10255          BBB      4.50     52.00        1        1
14       10254          AAA      4.45     48.33        0        0
15       10254          AAA      4.50     50.00        1        0

说明

  • 首先在['PROCESS_NO','PROCESS NAME']上应用groupby方法
  • 应用自定义功能进行分组。这将分组数据框作为参数
  • 然后使用pandas roll在3个值(以最后一个为中心)周围绘制一个窗口
  • 调用lambda函数以检查所有值是否均小于先前的值(0为“ FALSE”,而1为“ TRUE”)
  • 填充为0的窗口开始时没有3个先前值

有关更多信息,请参见: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html