Question

我目前正在研究熊猫的批量数据预处理框架，由于我对熊猫还比较陌生，所以我似乎无法解决此问题：

给出：一个包含两列的数据集：col_1，col_2

必需：新列req_col，如果
则其值增加一种。 col_1中的值不连续
或
b。col_2中的值递增连续

注意：

col_2总是从1开始，并且值总是增加并且值永远不会丢失（总是连续的），例如：1、2、2、3、3、4、5、6 ，6,6,7,8,8,9 .....
col_1始终从0开始，并且价值总是在增加，但是有些值可能会丢失（不需要连续），例如：: 0,1,2,2,3,6,6,6,10,10,10 ...

期望的答案：

col_1  col_2  req_col      #Changes in req_col explained below
 0        1        1
 0        1        1
 0        2        2       #because col_2 value has incremented
 1        2        2
 1        2        2
 3        2        3       #because '3' is not consectutive to '1' in col_1
 3        3        4       #because of increment in col_2
 5        3        5       #because '5' is not consecutive to '3' in col_1
 6        4        6       #because of increment in col_2 and so on...
 6        4        6

Answer 1

尝试：

df['req_col'] = (df['col_1'].diff().gt(1) | # col_1 is not consecutive
                 df['col_2'].diff().ne(0)   # col_2 is has a jump
                ).cumsum()

输出：

0    1
1    1
2    2
3    2
4    2
5    3
6    4
7    5
8    6
9    6
dtype: int32

熊猫：如何根据其他2列的增量和连续属性来增加新列？

1 个答案: