我有一个熊猫数据框,如下所示:
df = pd.DataFrame({'X':[1,1,1, 0, 0]})
df
X
0 1
1 1
2 1
3 0
4 0
现在我要根据以下条件修改X:
如果X = 0,则前一行+ 1 因此,我的最终输出应如下所示:
X
0 1
1 1
2 1
3 2
4 3
这可以通过遍历行并设置当前行和上一行并使用iloc来实现,并且可以按预期工作
for i in range(0, len(df)):
current_row = df.iloc[i]
if i > 0:
previous_row =df.iloc[i-1]
else:
previous_row = current_row
if (current_row['X'] == 0):
current_row['X'] = previous_row['X'] +1
我想要更有效的方法,并且尝试了以下代码,但输出结果并非我所期望的(第5行的X值应为3):
conditions = [df["X"] == 0]
values = [df["X"] .shift() + 1]
df['X'] = np.select(conditions, values)
>>> df
X
0 1
1 1
2 1
3 2
4 1
答案 0 :(得分:1)
您可以尝试以下操作:
import numpy as np
import pandas as pd
df = pd.DataFrame({'X': [1, 1, 1, 0, 0]})
# values previous to zero
pe_zero = df.X.shift(-1).eq(0) * df.X # [0 0 1 0 0]
# 1 for reach zero value as you sum one to the previous value
eq_zero = df.X.eq(0)
# find consecutive groups of 0
groups = pe_zero + eq_zero
consecutive = (groups.gt(0) != groups.gt(0).shift()).cumsum()
# find cumulative sum by groups
cumulative = groups.groupby(consecutive).cumsum()
# choose from cumulative when equals to zero else from original
result = np.where(eq_zero, cumulative, df.X)
print(result)
输出
[1 1 1 2 3]
更新
对于df = pd.DataFrame({'X': [1, 1, 1, 0, 0, 1, 1, 0, 0]})
返回:
[1 1 1 2 3 1 1 2 3]
答案 1 :(得分:0)
您可以尝试以下方法:
arr = df.X.values # extract the column as a numpy array for faster iteration
for i, val in enumerate(arr[1:], start=1):
if val == 0:
arr[i] = arr[i-1] + 1