我的函数中有一个很长的循环,它应该像这样覆盖当前的数据帧(31krows 和 370 行):
目标是在数据帧 (df_look_up) 中查找值,并根据 df_patients 数据帧中的条件覆盖当前数据帧 (df_patients)。
到目前为止,我拥有的函数在小样本集上运行良好,但在较大的样本集上运行数天。
def prepare_df_genetics(df_look_up, df_patients):
#iterate through columns in look-Up table
for index_col, column_snip in enumerate(df_look_up):
#print(str(index_col) + ":" + column_snip)
# if A1 == ALT,
if(df_look_up[column_snip].loc['A1'] == df_look_up[column_snip].loc['ALT']):
data = df_patients.loc[:, [column_snip]]
for index, row in data.iterrows():
if (row[column_snip]) == '1/1':
df_patients.loc[index,column_snip] = "2"
elif (row[column_snip]) == '0/1':
df_patients.loc[index,column_snip] = "1"
elif (row[column_snip]) == '0/0':
df_patients.loc[index,column_snip] = "0"
else:
df_patients.loc[index,column_snip] = "NaN"
#if A1 == REF,
elif (df_look_up[column_snip].loc['A1'] == df_look_up[column_snip].loc['REF']):
data = df_patients.loc[:, [column_snip]]
for index, row in data.iterrows():
if (row[column_snip]) == '0/0':
df_patients.loc[index,column_snip] = "2"
elif (row[column_snip]) == '0/1':
df_patients.loc[index,column_snip] = "1"
elif (row[column_snip]) == '1/1':
df_patients.loc[index,column_snip] = "0"
else:
df_patients.loc[index,column_snip] = "NaN"
return df_patients
两个给定的表格如下所示:
df_lookup table and df_patients table
并且需要覆盖的 df_patient 表如下所示:
我的问题是,是否有人想出提高效率的想法?我尝试使用 lambda 和 iterrows 等,但它们都没有真正奏效。
任何帮助将不胜感激!