熊猫:如何根据先前的值有效地更新行?

时间:2019-04-06 03:44:12

标签: pandas optimization

我有以下代码根据前一行的状态更新当前行:

prev_status = 0
for idx, row in df.iterrows():
    if prev_status in [1, 2] and row[column_a] != 0:
        row[column_b] += row[column_a]
        row[column_c] = 0
        row[column_d] = 0
        row[column_a] = 0
    prev_status = row[status]
    df.loc[idx] = row

但是,当运行1GB数据时,这非常慢。有什么方法可以对此进行优化?

2 个答案:

答案 0 :(得分:0)

例如,使用shift

df["new_column"] = df["column_name"].shift(x)

这将创建一列,其中值是另一列的值,该列的值偏移了x行数。这样一来,与对DataFrame中的每一行应用函数相比,对一列进行矢量化计算就更快了。

答案 1 :(得分:0)

尝试一下:

df['previous_status'] = df['status'].shift(1)
df.loc[df['previous_status'] in [1, 2] & df['column_a'] != 0, 'column_b'] += df['column_a']
df.loc[df['previous_status'] in [1, 2] & df['column_a'] != 0, 'column_c']  = 0
df.loc[df['previous_status'] in [1, 2] & df['column_a'] != 0, 'column_d']  = 0
df.loc[df['previous_status'] in [1, 2] & df['column_a'] != 0, 'column_a']  = 0