Question

我有一个应该可以解决的情况，但是我的解决方案太慢了（几个小时很慢）：

我有一个数据集，其中需要根据条件更改一些列的值。

我写了代码：

for i, row in df.iterrows():
    if row.specific_column_value == 1:
        continue

    col_names = ['A1', 'A2', ..., 'An']
    new_values = [1, 2, 3, 4, .., n]

    for j, col in enumerate(col_names):
        df.loc[i, col] = new_values[j]

这非常慢。

如何加快速度？

Answer 1

您可以先设置assign的新值，然后再设置.loc的条件

df.loc[ df.order_received == 1, col_names ] = new_values

更新

for i, row in df.iterrows():
    if row.specific_column_value == 1:
        col_names = ['A1', 'A2', ..., 'An']
        new_values = [1, 2, 3, 4, .., n]
        df.loc[i, col_names ] = new_values

Answer 2

If you have limited number of columns(n), you may be able to reduced the search 
operation to O(n) instead of O(m x n) complexity that you have in current approach

inx_collection = set()
value_looking_for = 1
col_values = [1, 2, 3, 4, .., n]
for col in df.columns:
    inx = df.index[df[col] == value_looking_for]
    inx_collection.update(inx) # This set collects all indices containing the value
df.loc[inx_collection,:] = col_value

为Pandas中的一组列设置新值

2 个答案: