熊猫数据框,如果其他条件基于上一行不起作用

时间:2019-10-15 20:10:11

标签: python

我有一个熊猫数据框,如下所示:

df = pd.DataFrame({'X':[1,1,1, 0, 0]})
df

    X
0   1
1   1
2   1
3   0
4   0

现在我要根据以下条件修改X:

如果X = 0,则前一行+ 1 因此,我的最终输出应如下所示:

    X
0   1   
1   1 
2   1
3   2
4   3

这可以通过遍历行并设置当前行和上一行并使用iloc来实现,并且可以按预期工作

for i in range(0, len(df)):
    current_row = df.iloc[i]
    if i > 0:
        previous_row =df.iloc[i-1]
    else:
        previous_row = current_row
    if (current_row['X'] == 0):
        current_row['X']  = previous_row['X'] +1

我想要更有效的方法,并且尝试了以下代码,但输出结果并非我所期望的(第5行的X值应为3):

conditions = [df["X"] == 0]
values = [df["X"] .shift() + 1]
df['X'] = np.select(conditions, values)

>>> df
     X
0  1
1  1
2  1
3  2
4  1

2 个答案:

答案 0 :(得分:1)

您可以尝试以下操作:

import numpy as np
import pandas as pd

df = pd.DataFrame({'X': [1, 1, 1, 0, 0]})

# values previous to zero
pe_zero = df.X.shift(-1).eq(0) * df.X  # [0 0 1 0 0]

# 1 for reach zero value as you sum one to the previous value
eq_zero = df.X.eq(0)

# find consecutive groups of 0
groups = pe_zero + eq_zero
consecutive = (groups.gt(0) != groups.gt(0).shift()).cumsum()

# find cumulative sum by groups
cumulative = groups.groupby(consecutive).cumsum()

# choose from cumulative when equals to zero else from original
result = np.where(eq_zero, cumulative, df.X)

print(result)

输出

[1 1 1 2 3]

更新

对于df = pd.DataFrame({'X': [1, 1, 1, 0, 0, 1, 1, 0, 0]}) 返回:

[1 1 1 2 3 1 1 2 3]

答案 1 :(得分:0)

您可以尝试以下方法:

arr = df.X.values # extract the column as a numpy array for faster iteration
for i, val in enumerate(arr[1:], start=1):
    if val == 0:
        arr[i] = arr[i-1] + 1