我有一个应该可以解决的情况,但是我的解决方案太慢了(几个小时很慢):
我有一个数据集,其中需要根据条件更改一些列的值。
我写了代码:
for i, row in df.iterrows():
if row.specific_column_value == 1:
continue
col_names = ['A1', 'A2', ..., 'An']
new_values = [1, 2, 3, 4, .., n]
for j, col in enumerate(col_names):
df.loc[i, col] = new_values[j]
这非常慢。
如何加快速度?
答案 0 :(得分:1)
您可以先设置assign
的新值,然后再设置.loc
的条件
df.loc[ df.order_received == 1, col_names ] = new_values
更新
for i, row in df.iterrows():
if row.specific_column_value == 1:
col_names = ['A1', 'A2', ..., 'An']
new_values = [1, 2, 3, 4, .., n]
df.loc[i, col_names ] = new_values
答案 1 :(得分:0)
If you have limited number of columns(n), you may be able to reduced the search
operation to O(n) instead of O(m x n) complexity that you have in current approach
inx_collection = set()
value_looking_for = 1
col_values = [1, 2, 3, 4, .., n]
for col in df.columns:
inx = df.index[df[col] == value_looking_for]
inx_collection.update(inx) # This set collects all indices containing the value
df.loc[inx_collection,:] = col_value