熊猫/ Numpy组值更改和导数值更改在0

时间:2018-10-30 17:18:26

标签: python pandas numpy

我有一系列值(Pandas DF或Numpy Arr):

vals = [0,1,3,4,5,5,4,2,1,0,-1,-2,-3,-2,3,5,8,4,2,0,-1,-3,-8,-20,-10,-5,-2,-1,0,1,2,3,5,6,8,4,3]
df = pd.DataFrame({'val': vals})

我想将值分类/分组为4类:

  1. 增加到0以上
  2. 降低到0以上
  3. 增加到0以下
  4. 减小到0以下

目前使用Pandas的方法是将diff值何时更改为大于/小于0,然后将其分类为0以上/以下。

df['above_zero'] = np.where(df['val'] >= 0, 1, 0)
df['below_zero'] = np.where(df['val'] < 0, 1, 0)
df['diffs'] = df['val'].diff()
df['diff_above_zero'] = np.where(df['diffs'] >= 0, 1, 0)
df['diff_below_zero'] = np.where(df['diffs'] < 0, 1, 0)

这将产生所需的输出,但是现在我试图找到一种解决方案,一旦这四个条件之一发生更改,如何将这些列分组为升序组号。

所需的输出看起来像这样(* group col是手动键入的,可能有计算所得的错误):

id   val  above_zero  below_zero  diffs  diff_above_zero  diff_below_zero  group
0     0           1           0    0.0                1                0      0
1     1           1           0    1.0                1                0      0
2     3           1           0    2.0                1                0      0
3     4           1           0    1.0                1                0      0
4     5           1           0    1.0                1                0      0
5     5           1           0    0.0                1                0      0
6     4           1           0   -1.0                0                1      1
7     2           1           0   -2.0                0                1      1
8     1           1           0   -1.0                0                1      1
9     0           1           0   -1.0                0                1      1
10   -1           0           1   -1.0                0                1      2
11   -2           0           1   -1.0                0                1      2
12   -3           0           1   -1.0                0                1      2
13   -2           0           1    1.0                1                0      3
14    3           1           0    5.0                1                0      4
15    5           1           0    2.0                1                0      4
16    8           1           0    3.0                1                0      4
17    4           1           0   -4.0                0                1      5
18    2           1           0   -2.0                0                1      5
19    0           1           0   -2.0                0                1      5
20   -1           0           1   -1.0                0                1      6
21   -3           0           1   -2.0                0                1      6
22   -8           0           1   -5.0                0                1      6
23  -20           0           1  -12.0                0                1      6
24  -10           0           1   10.0                1                0      7
25   -5           0           1    5.0                1                0      7
26   -2           0           1    3.0                1                0      7
27   -1           0           1    1.0                1                0      7
28    0           1           0    1.0                1                0      8
29    1           1           0    1.0                1                0      8
30    2           1           0    1.0                1                0      8
31    3           1           0    1.0                1                0      8
32    5           1           0    2.0                1                0      8
33    6           1           0    1.0                1                0      8
34    8           1           0    2.0                1                0      8
35    4           1           0   -4.0                0                1      9
36    3           1           0   -1.0                0                1      9

将感谢您对如何有效解决此问题的任何帮助。谢谢!

2 个答案:

答案 0 :(得分:2)

设置

g1 = ['above_zero', 'below_zero', 'diff_above_zero', 'diff_below_zero']

您可以简单地索引所有布尔列,并使用shift

c = df.loc[:, g1]
(c != c.shift().fillna(c)).any(1).cumsum()

0     0
1     0
2     0
3     0
4     0
5     0
6     1
7     1
8     1
9     1
10    2
11    2
12    2
13    3
14    4
15    4
16    4
17    5
18    5
19    5
20    6
21    6
22    6
23    6
24    7
25    7
26    7
27    7
28    8
29    8
30    8
31    8
32    8
33    8
34    8
35    9
36    9
dtype: int32

答案 1 :(得分:1)

以下代码将产生两列:c1c2

c1的值对应于以下4个类别:

  • 0表示零以下并递增
  • 1表示高于零并不断增加
  • 2表示低于零并递减
  • 3表示高于零并递减

并且c2对应于条件(即c1)更改(根据需要)后的升序组号。感谢将shiftcumsum配合使用的@ user3483203

# calculate difference
df["diff"] = df['val'].diff()
# set first value in column 'diff' to 0 (as previous step sets it to NaN)
df.loc[0, 'diff'] = 0
df["c1"] = (df['val'] >= 0).astype(int) + (df["diff"] < 0).astype(int) * 2
df["c2"] = (df["c1"] != df["c1"].shift().fillna(df["c1"])).astype(int).cumsum()

结果:

    val  diff  c1  c2
0     0   0.0   1   0
1     1   1.0   1   0
2     3   2.0   1   0
3     4   1.0   1   0
4     5   1.0   1   0
5     5   0.0   1   0
6     4  -1.0   3   1
7     2  -2.0   3   1
8     1  -1.0   3   1
9     0  -1.0   3   1
10   -1  -1.0   2   2
11   -2  -1.0   2   2
12   -3  -1.0   2   2
13   -2   1.0   0   3
14    3   5.0   1   4
15    5   2.0   1   4
16    8   3.0   1   4
17    4  -4.0   3   5
18    2  -2.0   3   5
19    0  -2.0   3   5
20   -1  -1.0   2   6
21   -3  -2.0   2   6
22   -8  -5.0   2   6
23  -20 -12.0   2   6
24  -10  10.0   0   7
25   -5   5.0   0   7
26   -2   3.0   0   7
27   -1   1.0   0   7
28    0   1.0   1   8
29    1   1.0   1   8
30    2   1.0   1   8
31    3   1.0   1   8
32    5   2.0   1   8
33    6   1.0   1   8
34    8   2.0   1   8
35    4  -4.0   3   9
36    3  -1.0   3   9