我在数据帧中的列x只有0和1。我想创建变量y,该变量开始计数零并在x出现1时重置。我收到一个错误:“系列的真值不明确。”
count=1
countList=[0]
for x in df['x']:
if df['x'] == 0:
count = count + 1
df['y']= count
else:
df['y'] = 1
count = 1
答案 0 :(得分:3)
首先不要在熊猫中循环,因为如果存在矢量化解决方案,它会很慢。
我认为需要计算连续的0
值:
df = pd.DataFrame({'x':[1,0,0,1,1,0,1,0,0,0,1,1,0,0,0,0,1]})
a = df['x'].eq(0)
b = a.cumsum()
df['y'] = (b-b.mask(a).ffill().fillna(0).astype(int))
print (df)
x y
0 1 0
1 0 1
2 0 2
3 1 0
4 1 0
5 0 1
6 1 0
7 0 1
8 0 2
9 0 3
10 1 0
11 1 0
12 0 1
13 0 2
14 0 3
15 0 4
16 1 0
详细信息+解释:
#compare by zero
a = df['x'].eq(0)
#cumulative sum of mask
b = a.cumsum()
#replace Trues to NaNs
c = b.mask(a)
#forward fill NaNs
d = b.mask(a).ffill()
#First NaNs to 0 and cast to integers
e = b.mask(a).ffill().fillna(0).astype(int)
#subtract from cumulative sum Series
y = b - e
df = pd.concat([df['x'], a, b, c, d, e, y], axis=1, keys=('x','a','b','c','d','e', 'y'))
print (df)
x a b c d e y
0 0 True 1 NaN NaN 0 1
1 0 True 2 NaN NaN 0 2
2 0 True 3 NaN NaN 0 3
3 1 False 3 3.0 3.0 3 0
4 1 False 3 3.0 3.0 3 0
5 0 True 4 NaN 3.0 3 1
6 1 False 4 4.0 4.0 4 0
7 0 True 5 NaN 4.0 4 1
8 0 True 6 NaN 4.0 4 2
9 0 True 7 NaN 4.0 4 3
10 1 False 7 7.0 7.0 7 0
11 1 False 7 7.0 7.0 7 0
12 0 True 8 NaN 7.0 7 1
13 0 True 9 NaN 7.0 7 2
14 0 True 10 NaN 7.0 7 3
15 0 True 11 NaN 7.0 7 4
16 1 False 11 11.0 11.0 11 0