我有一个如下所述的数据框:
PROCESS_NO PROCESS_NAME RESULT_2 RESULT_3
10254 AAA 4.40 46.67
10254 AAA 4.45 48.33
10254 AAA 4.50 50.00
10254 AAA 4.45 48.33
10254 AAA 4.50 50.00
10255 BBB 4.50 50.00
10255 BBB 4.50 50.00
10254 AAA 4.45 48.33
10254 AAA 4.45 48.33
10254 AAA 4.45 48.33
10255 BBB 4.50 51.60
10255 BBB 4.50 52.80
10255 BBB 4.50 56.80
10255 BBB 4.50 51.70
10255 BBB 4.46 57.90
10255 BBB 4.44 52.00
我想检查对应的RESULT_2,RESULT_3值是否等于或大于前3行的值,然后在另一列按PROCESS_NO,PROCESS NAME 分组的字段中为True或false。
我想要这样的结果数据框。
PROCESS_NO PROCESS NAME RESULT_2 CHECK_2 RESULT_3 CHECK_2
10254 AAA 4.40 FALSE 46.67 FALSE
10254 AAA 4.45 FALSE 48.33 FALSE
10254 AAA 4.45 TRUE 48.33 TRUE
10254 AAA 4.45 TRUE 48.33 TRUE
10254 AAA 4.45 TRUE 48.33 TRUE
10254 AAA 4.50 TRUE 50.00 TRUE
10254 AAA 4.45 FALSE 48.33 FALSE
10254 AAA 4.50 TRUE 50.00 TRUE
10255 BBB 4.50 FALSE 50.00 FALSE
10255 BBB 4.50 FALSE 50.00 FALSE
10255 BBB 4.50 TRUE 51.60 TRUE
10255 BBB 4.50 TRUE 52.80 TRUE
10255 BBB 4.50 TRUE 56.80 TRUE
10255 BBB 4.50 TRUE 51.70 FALSE
10255 BBB 4.46 FALSE 57.90 TRUE
10255 BBB 4.44 FALSE 52.00 FALSE
答案 0 :(得分:5)
不使用Numpy并以最简单的方式进行操作:
import pandas as pd
data = [[10254,'AAA',4.40,46.67],
[10255,'BBB',4.50,50.00],
[10255,'BBB',4.50,50.00],
[10254,'AAA',4.45,48.33],
[10254,'AAA',4.50,50.00],
[10254,'AAA',1.50,10.00],]
dataframe = pd.DataFrame(data, columns=['PROCESS_NO','PROCESS NAME','RESULT_2','RESULT_3'])
dataframe['CHECK_2'] = 'FALSE'
dataframe['CHECK_3'] = 'FALSE'
check2_position = dataframe.columns.get_loc('CHECK_2')
check3_position = dataframe.columns.get_loc('CHECK_3')
for i in range(0,len(dataframe)):
if i >= 3 :
current_result2 = dataframe.iloc[i]['RESULT_2'];
if(current_result2 >= dataframe.iloc[i-1]['RESULT_2'] or
current_result2 >= dataframe.iloc[i-2]['RESULT_2'] or
current_result2 >= dataframe.iloc[i-3]['RESULT_2'] ):
dataframe.iat[i,check2_position] = 'TRUE'
current_result3 = dataframe.iloc[i]['RESULT_3'];
if(current_result3 >= dataframe.iloc[i-1]['RESULT_3'] or
current_result3 >= dataframe.iloc[i-2]['RESULT_3'] or
current_result3 >= dataframe.iloc[i-3]['RESULT_3'] ):
dataframe.iat[i,check3_position] = 'TRUE'
print(dataframe)
结果随心所欲:
PROCESS_NO PROCESS NAME RESULT_2 RESULT_3 CHECK_2 CHECK_3
0 10254 AAA 4.40 46.67 FALSE FALSE
1 10255 BBB 4.50 50.00 FALSE FALSE
2 10255 BBB 4.50 50.00 FALSE FALSE
3 10254 AAA 4.45 48.33 TRUE TRUE
4 10254 AAA 4.50 50.00 TRUE TRUE
5 10254 AAA 1.50 10.00 FALSE FALSE
希望对您有所帮助。
干杯。
答案 1 :(得分:2)
尝试一下。
def greater_than(df_grp):
df_grp['CHECK_2'] = df['RESULT_2'].rolling(3).apply(lambda x: all(x[2] >= i for i in x[:1]))
df_grp['CHECK_3'] = df['RESULT_3'].rolling(3).apply(lambda x: all(x[2] >= i for i in x[:1]))
df_grp[['CHECK_2','CHECK_3']] = df_grp[['CHECK_2','CHECK_3']].fillna(0).astype(int)
return df_grp
result = df.groupby(['PROCESS_NO', 'PROCESS NAME']).apply(greater_than)
print(result)
输出
PROCESS_NO PROCESS NAME RESULT_2 RESULT_3 CHECK_2 CHECK_3
0 10254 AAA 4.40 46.67 0 0
1 10255 BBB 4.50 50.00 0 0
2 10255 BBB 4.50 50.00 1 1
3 10254 AAA 4.45 48.33 0 0
4 10254 AAA 4.45 48.33 0 0
5 10254 AAA 4.45 48.33 1 1
6 10254 AAA 4.45 48.33 1 1
7 10254 AAA 4.50 50.00 1 1
8 10255 BBB 4.50 51.60 1 1
9 10255 BBB 4.50 52.80 1 1
10 10255 BBB 4.50 56.80 1 1
11 10255 BBB 4.50 51.70 1 0
12 10255 BBB 4.50 57.90 1 1
13 10255 BBB 4.50 52.00 1 1
14 10254 AAA 4.45 48.33 0 0
15 10254 AAA 4.50 50.00 1 0
说明
有关更多信息,请参见: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html