我正在尝试在使用两个iterrows
的熊猫中转换两个嵌套的for循环,以提高性能和使用熊猫方法的速度。
最初,我已经使用两个数据帧并遍历嵌套循环并使用条件比较值来解决此问题,然后使用其索引在嵌套循环中设置了值。但是,由于这样做速度较慢,而且熊猫使用不当,因此我尝试使用诸如apply
或merge
之类的方法,但无法解决该问题。 link为我提供了一些指导,但没有太多。
poa_col = [col for col in poa.columns if 'CODIGO_' in col]
for idx, row in df_non_dup.iterrows():
for sub_idx, sub_row in poa.iterrows():
if row['CODIGO_SITE'] == sub_row[poa_col[0]]:
if '/' in row['g_names']:
g_names_split = row['g_names'].split('/')
for g_name in g_names_split:
if '2G' in g_name:
if pd.isnull(sub_row['ALARMAS 2G']):
poa['ALARMAS 2G'].loc[sub_idx] = row['Name']
else:
poa['ALARMAS 2G'].loc[sub_idx] = str(
sub_row['ALARMAS 2G']) + '/' + row['Name']
elif '3G' in g_name:
if pd.isnull(sub_row['ALARMAS 3G']):
poa['ALARMAS 3G'].loc[sub_idx] = row['Name']
else:
poa['ALARMAS 3G'].loc[sub_idx] = str(
sub_row['ALARMAS 3G']) + '/' + row['Name']
elif '4G' in g_name:
if pd.isnull(sub_row['ALARMAS 4G']):
poa['ALARMAS 4G'].loc[sub_idx] = row['Name']
else:
poa['ALARMAS 4G'].loc[sub_idx] = str(
sub_row['ALARMAS 4G']) + '/' + row['Name']
else:
if '2G' in row['g_names']:
if pd.isnull(sub_row['ALARMAS 2G']):
poa['ALARMAS 2G'].loc[sub_idx] = row['Name']
else:
poa['ALARMAS 2G'].loc[sub_idx] = str(
sub_row['ALARMAS 2G']) + '/' + row['Name']
elif '3G' in row['g_names']:
if pd.isnull(sub_row['ALARMAS 3G']):
poa['ALARMAS 3G'].loc[sub_idx] = row['Name']
else:
poa['ALARMAS 3G'].loc[sub_idx] = str(
sub_row['ALARMAS 3G']) + '/' + row['Name']
elif '4G' in row['g_names']:
if pd.isnull(sub_row['ALARMAS 4G']):
poa['ALARMAS 4G'].loc[sub_idx] = row['Name']
else:
poa['ALARMAS 4G'].loc[sub_idx] = str(
sub_row['ALARMAS 4G']) + '/' + row['Name']
以上是我最初的尝试,虽然可以,但是需要很长时间。
下面是一些示例数据;
poa.head(1)
Out[230]:
CODIGO_Elemento Red ALARMAS 4G ALARMAS 3G ALARMAS 2G
DAF NaN NaN NaN
df_non_dup.head(2)
Out[231]:
Name CODIGO_SITE g_names
0 - Clapham DAF 2G
1 - Brixton DAF 2G
使用显示的数据,我希望能够将ALARMAS 2G
附加到df_non_dup['Name']
中,因为df_non_dup['g_names']
都是2G
,所以poa.head(1)
看起来像;
Out[230]:
CODIGO_Elemento Red ALARMAS 4G ALARMAS 3G ALARMAS 2G
DAF NaN NaN Clapham/Brixton