列上的python数据框计数器

时间:2018-07-11 05:56:43

标签: python-3.x pandas dataframe

我在数据帧中的列x只有0和1。我想创建变量y,该变量开始计数零并在x出现1时重置。我收到一个错误:“系列的真值不明确。”

count=1                   
countList=[0]

for x in df['x']:   
if  df['x'] == 0:     
    count = count + 1
    df['y']= count
else:     
    df['y'] = 1     
    count = 1

1 个答案:

答案 0 :(得分:3)

首先不要在熊猫中循环,因为如果存在矢量化解决方案,它会很慢。

我认为需要计算连续的0值:

df = pd.DataFrame({'x':[1,0,0,1,1,0,1,0,0,0,1,1,0,0,0,0,1]})

a = df['x'].eq(0)
b = a.cumsum()
df['y'] = (b-b.mask(a).ffill().fillna(0).astype(int))
print (df)

    x  y
0   1  0
1   0  1
2   0  2
3   1  0
4   1  0
5   0  1
6   1  0
7   0  1
8   0  2
9   0  3
10  1  0
11  1  0
12  0  1
13  0  2
14  0  3
15  0  4
16  1  0

详细信息+解释

#compare by zero
a = df['x'].eq(0)
#cumulative sum of mask
b = a.cumsum()
#replace Trues to NaNs
c = b.mask(a)
#forward fill NaNs
d = b.mask(a).ffill()
#First NaNs to 0 and cast to integers
e = b.mask(a).ffill().fillna(0).astype(int)
#subtract from cumulative sum Series
y = b - e
df = pd.concat([df['x'], a, b, c, d, e, y], axis=1, keys=('x','a','b','c','d','e', 'y'))
print (df)
    x      a   b     c     d   e  y
0   0   True   1   NaN   NaN   0  1
1   0   True   2   NaN   NaN   0  2
2   0   True   3   NaN   NaN   0  3
3   1  False   3   3.0   3.0   3  0
4   1  False   3   3.0   3.0   3  0
5   0   True   4   NaN   3.0   3  1
6   1  False   4   4.0   4.0   4  0
7   0   True   5   NaN   4.0   4  1
8   0   True   6   NaN   4.0   4  2
9   0   True   7   NaN   4.0   4  3
10  1  False   7   7.0   7.0   7  0
11  1  False   7   7.0   7.0   7  0
12  0   True   8   NaN   7.0   7  1
13  0   True   9   NaN   7.0   7  2
14  0   True  10   NaN   7.0   7  3
15  0   True  11   NaN   7.0   7  4
16  1  False  11  11.0  11.0  11  0