我是Python编码的新手。目前,我正在尝试分析包含多个工作流程的数据框。每个工作流程都有用于启动和结束工作流程的不同处理步骤。在简化版本中,我的数据如下所示:
Workflow Initiate End_1 End_2 End_3
0 1 Name_1 na Name_1 na
1 2 Name_2 na na na
2 3 Name_3 na na Name_5
3 4 Name_4 Name_5 na na
4 5 na na na Name_5
对于每个工作流程,我想比较结束工作流程的名称和启动工作流程的名称是否不同。
通过以下方式遍历各行,可以得到所需的输出:
for index, row in df.iterrows():
if ((row['Initiate'] != 'na')
and (row['Initiate'] == row['End_1']) |
(row['Initiate'] == row['End_2']) |
(row['Initiate'] == row['End_3'])
):
print("Name end equals initiate")
elif ((row['End_1'] == 'na') &
(row['End_2'] == 'na') &
(row['End_3'] == 'na')
):
print("No name ended")
else:
print("Different name ended")
Name end equals initiate
No name ended
Different name ended
Different name ended
Different name ended
但是,我想在数据框中添加一列“分析”,以显示每个工作流程背后的上述结果。
为此,我将代码填充到一个函数中:
def function_name(a, b, c, d):
for index, row in df.iterrows():
if ((a != 'na')
and (a == b) |
(a == c) |
(a == d)
):
return "Name end equals initiate"
elif ((b == 'na') &
(c == 'na') &
(d == 'na')
):
return "No name ended"
else:
return "Different name ended"
df['Analysis'] = function_name(row['Initiate'],
row['End_1'],
row['End_2'],
row['End_3'])
print(df)
Workflow Initiate ... End_3 Analysis
0 1 Name_1 ... na Different name ended
1 2 Name_2 ... na Different name ended
2 3 Name_3 ... Name_5 Different name ended
3 4 Name_4 ... na Different name ended
4 5 na ... Name_5 Different name ended
您可以看到输出与第一个分析不同。我想在数据框中添加一个额外的列,该列为我提供与print语句相同的输出。
答案 0 :(得分:0)
您应该在此处避免按行循环。您的算法是矢量化的:
df = df.replace('na', np.nan) # replace string 'na' with NaN for efficient processing
ends = df.filter(like='End') # filter by columns with 'End'
match = ends.ffill(1).iloc[:, -1] == df['Initiate'] # find last Name in each End
nulls = ends.isnull().all(1) # check which rows are all null
# apply vectorised conditional logic
df['Result'] = np.select([match, nulls], ['Name end equals initiate', 'No name ended'],
'Different name ended')
print(df)
Workflow Initiate End_1 End_2 End_3 Result
0 1 Name_1 NaN Name_1 NaN Name end equals initiate
1 2 Name_2 NaN NaN NaN No name ended
2 3 Name_3 NaN NaN Name_5 Different name ended
3 4 Name_4 Name_5 NaN NaN Different name ended
4 5 NaN NaN NaN Name_5 Different name ended
答案 1 :(得分:0)
尝试使用np.select()
conditions = [
(df['Initiate'] != 'na') & ((df['Initiate'] == df['End_1']) | (df['Initiate'] == df['End_2']) | (df['Initiate'] == df['End_3'])),
(df['End_1'] == 'na') & (df['End_2'] == 'na') & (df['End_3'] == 'na')
]
answers = ['Name end equals initiate','No name ended']
df['Analysis'] = np.select(conditions, answers, default='Different name ended')