Python中的循环需要大量时间才能得出结果。其中包含约10万条记录。
这需要很多时间。如何减少时间
df['loan_agr'] = df['loan_agr'].astype(int)
for i in range(len(df)):
if df.loc[i,'order_mt']== df.loc[i,'enr_mt']:
df['new_N_Loan'] = 1
df['exist_N_Loan'] = 0
df['new_V_Loan'] = df['loan_agr']
df['exist_V_Loan'] = 0
else:
df['new_N_Loan'] = 0
df['exist_N_Loan'] = 1
df['new_V_Loan'] = 0
df['exist_V_Loan'] = df['loan_agr']
答案 0 :(得分:5)
您可以使用loc
并以矢量化方式设置新值。这种方法比使用迭代要快得多,因为这些操作是在整个列上一次执行的,而不是单个值。查看this article,了解有关熊猫速度优化的更多信息。
例如:
mask = df['order_mt'] == df['enr_mt']
df.loc[mask, ['new_N_Loan', 'exist_N_Loan', 'exist_V_Loan']] = [1, 0, 0]
df.loc[mask, ['new_V_Loan']] = df['loan_agr']
df.loc[~mask, ['new_N_Loan', 'exist_N_Loan', 'new_V_Loan']] = [0, 1, 0]
df.loc[~mask, ['exist_V_Loan']] = df['loan_agr']
编辑:
如果您的熊猫版本不支持~
(按位不)运算符,则可以为“ else”条件制作一个新的掩码,类似于第一个条件。
例如:
mask = df['order_mt'] == df['enr_mt']
else_mask = df['order_mt'] != df['enr_mt']
然后将else_mask
用于第二组定义,而不是~mask
。
示例:
输入:
order_mt enr_mt new_N_Loan exist_N_Loan exist_V_Loan new_V_Loan loan_agr
0 1 1 None None None None 100
1 2 2 None None None None 200
2 3 30 None None None None 300
3 4 40 None None None None 400
输出:
order_mt enr_mt new_N_Loan exist_N_Loan exist_V_Loan new_V_Loan loan_agr
0 1 1 1 0 0 100 100
1 2 2 1 0 0 200 200
2 3 30 0 1 300 0 300
3 4 40 0 1 400 0 400
答案 1 :(得分:0)
您可以将len函数更改为一个值,而不是range(Len(...))。