熊猫:满足多个条件时的条件计数

时间:2018-11-20 15:00:19

标签: python pandas

我有一个数据框,如下所示:

                      dtm        f           C      A   B
0   2018-03-01 00:00:00 +0000   50.135  9.000000    0   0
1   2018-03-01 00:00:01 +0000   50.130  9.000000    0   0
2   2018-03-01 00:00:02 +0000   50.120  9.000000    0   0
3   2018-03-01 00:00:03 +0000   50.112  9.000000    0   0
4   2018-03-01 00:00:04 +0000   50.102  9.000000    0   0
5   2018-03-01 00:00:05 +0000   50.097  9.000000    0   0
6   2018-03-01 00:00:06 +0000   11.095  9.000000    0   0
7   2018-03-01 00:00:07 +0000   11.095  9.000000    0   0
8   2018-03-01 00:00:08 +0000   11.092  9.000000    0   0
9   2018-03-01 00:00:09 +0000   11.095  9.000000    0   0
10  2018-03-01 00:00:10 +0000   11.097  5.000000    0   0
11  2018-03-01 00:00:11 +0000   11.097  5.000000    0   0
12  2018-03-01 00:00:12 +0000   11.097  5.000000    0   0
13  2018-03-01 00:00:13 +0000   50.100  5.000000    0   0
14  2018-03-01 00:00:14 +0000   50.102  5.000000    0   0
15  2018-03-01 00:00:15 +0000   50.105  5.000000    0   0
16  2018-03-01 00:00:16 +0000   50.102  5.000000    0   0
17  2018-03-01 00:00:17 +0000   50.102  5.000000    0   0

A和B是两个这样工作的计数器:

  • if((f> = 50)或(f <50&C <8))然后A增加1

  • 如果f <50和C> 8,则B增加1

预期结果应为:

                      dtm           f         C     A   B
0   2018-03-01 00:00:00 +0000   50.135  9.000000    0   0
1   2018-03-01 00:00:01 +0000   50.130  9.000000    1   0
2   2018-03-01 00:00:02 +0000   50.120  9.000000    2   0
3   2018-03-01 00:00:03 +0000   50.112  9.000000    3   0
4   2018-03-01 00:00:04 +0000   50.102  9.000000    4   0
5   2018-03-01 00:00:05 +0000   50.097  9.000000    5   0
6   2018-03-01 00:00:06 +0000   11.095  9.000000    5   1
7   2018-03-01 00:00:07 +0000   11.095  9.000000    5   2   
8   2018-03-01 00:00:08 +0000   11.092  9.000000    5   3
9   2018-03-01 00:00:09 +0000   11.095  9.000000    5   4
10  2018-03-01 00:00:10 +0000   11.097  5.000000    6   4
11  2018-03-01 00:00:11 +0000   11.097  5.000000    7   4
12  2018-03-01 00:00:12 +0000   11.097  5.000000    8   4
13  2018-03-01 00:00:13 +0000   50.100  5.000000    9   4
14  2018-03-01 00:00:14 +0000   50.102  5.000000    10  4
15  2018-03-01 00:00:15 +0000   50.105  5.000000    11  4
16  2018-03-01 00:00:16 +0000   50.102  5.000000    12  4
17  2018-03-01 00:00:17 +0000   50.102  5.000000    13  4

请注意,当A增加时B保持其值,反之亦然。他们不会重置。有什么想法吗?

提前谢谢!

2 个答案:

答案 0 :(得分:5)

对我来说,很好地用sub减去1,并在第一行中删除了-1,请添加clip_lower

m1 = (df.f >=50) | ((df.f<50) & (df.C<8))
m2 = (df.f<50) & (df.C>8)

df['A'] = m1.cumsum().sub(1).clip_lower(0)
df['B'] = m2.cumsum().sub(1).clip_lower(0)

答案 1 :(得分:5)

假设

  • df.C > 8原本是df.C >= 8,因为这是对df.C < 8
  • 的称赞。
  • (df.f < 50) & (df.C < 8)并不是必需的,因为它的另一端是or语句和df.f >= 50
  • 'A'开头的列0似乎很奇怪,需要特殊处理。假设它从零开始并在第一个True
  • 开始递增会更干净

符合assign

a = df.f.values >= 50
b = df.C.values < 8
c = a | b

df.assign(A=c.cumsum(), B=(~c).cumsum())

                          dtm       f    C   A  B
0   2018-03-01 00:00:00 +0000  50.135  9.0   1  0
1   2018-03-01 00:00:01 +0000  50.130  9.0   2  0
2   2018-03-01 00:00:02 +0000  50.120  9.0   3  0
3   2018-03-01 00:00:03 +0000  50.112  9.0   4  0
4   2018-03-01 00:00:04 +0000  50.102  9.0   5  0
5   2018-03-01 00:00:05 +0000  50.097  9.0   6  0
6   2018-03-01 00:00:06 +0000  11.095  9.0   6  1
7   2018-03-01 00:00:07 +0000  11.095  9.0   6  2
8   2018-03-01 00:00:08 +0000  11.092  9.0   6  3
9   2018-03-01 00:00:09 +0000  11.095  9.0   6  4
10  2018-03-01 00:00:10 +0000  11.097  5.0   7  4
11  2018-03-01 00:00:11 +0000  11.097  5.0   8  4
12  2018-03-01 00:00:12 +0000  11.097  5.0   9  4
13  2018-03-01 00:00:13 +0000  50.100  5.0  10  4
14  2018-03-01 00:00:14 +0000  50.102  5.0  11  4
15  2018-03-01 00:00:15 +0000  50.105  5.0  12  4
16  2018-03-01 00:00:16 +0000  50.102  5.0  13  4
17  2018-03-01 00:00:17 +0000  50.102  5.0  14  4

就地

a = df.f.values >= 50
b = df.C.values < 8
c = a | b

df[['A', 'B']] = np.column_stack([c, ~c]).cumsum(0)
df

减少

c = (df.f.values >= 50) | (df.C.values < 8)

df.assign(A=c.cumsum(), B=(~c).cumsum())

经过特殊处理

a = df.f.values >= 50
b = df.C.values < 8
c0 = a | b
c1 = ~c0

c0[0] = False
c1[0] = False

df.assign(A=c0.cumsum(), B=c1.cumsum())

                          dtm       f    C   A  B
0   2018-03-01 00:00:00 +0000  50.135  9.0   0  0
1   2018-03-01 00:00:01 +0000  50.130  9.0   1  0
2   2018-03-01 00:00:02 +0000  50.120  9.0   2  0
3   2018-03-01 00:00:03 +0000  50.112  9.0   3  0
4   2018-03-01 00:00:04 +0000  50.102  9.0   4  0
5   2018-03-01 00:00:05 +0000  50.097  9.0   5  0
6   2018-03-01 00:00:06 +0000  11.095  9.0   5  1
7   2018-03-01 00:00:07 +0000  11.095  9.0   5  2
8   2018-03-01 00:00:08 +0000  11.092  9.0   5  3
9   2018-03-01 00:00:09 +0000  11.095  9.0   5  4
10  2018-03-01 00:00:10 +0000  11.097  5.0   6  4
11  2018-03-01 00:00:11 +0000  11.097  5.0   7  4
12  2018-03-01 00:00:12 +0000  11.097  5.0   8  4
13  2018-03-01 00:00:13 +0000  50.100  5.0   9  4
14  2018-03-01 00:00:14 +0000  50.102  5.0  10  4
15  2018-03-01 00:00:15 +0000  50.105  5.0  11  4
16  2018-03-01 00:00:16 +0000  50.102  5.0  12  4
17  2018-03-01 00:00:17 +0000  50.102  5.0  13  4