Question

我有一列，迭代1到3。我需要一个循环号，该循环号出现在中间的列中。如何使用熊猫获取第二列编号？

这是表格：

column  | I need   |Note
-----------------------------------------------------------------------
2       | 1        |first cycle although not starting from 1
3       | 1        |first cycle although not starting from 1
-----------------------------------------------------------------------
1       | 2        |second cycle
2       | 2        |second cycle
3       | 2        |second cycle
-----------------------------------------------------------------------
1       | 3        |
2       | 3        |
3       | 3        |
-----------------------------------------------------------------------
1       | 4        |
2       | 4        |
3       | 4        |
-----------------------------------------------------------------------
1       | 5        |
2       | 5        |
3       | 5        |
-----------------------------------------------------------------------
1       | 6        |
2       | 6        |
3       | 6        |
-----------------------------------------------------------------------
1       | 7        |7th cycle and does have to end in 3
2       | 7        |

Answer 1

在样本数据的第一个差异为Series.diff的情况下，比较不像0，而最后一个累加的总和为Series.cumsum：

df['new'] = df['column'].diff().lt(0).cumsum() + 1

如果值是字符串，则可以通过Series.map使用字典将其编码为数字：

df['new'] = df['column'].map({'1':0, '2':2, '3':3}).diff().lt(0).cumsum() + 1

print (df)
    column  I need  new
0        2       1    1
1        3       1    1
2        1       2    2
3        2       2    2
4        3       2    2
5        1       3    3
6        2       3    3
7        3       3    3
8        1       4    4
9        2       4    4
10       3       4    4
11       1       5    5
12       2       5    5
13       3       5    5
14       1       6    6
15       2       6    6
16       3       6    6
17       1       7    7
18       2       7    7

编辑：您可以使用enumerate通过一组中的所有值来为地图创建字典：

d = {v:k for k, v in enumerate(['1','2','3'])}
#if possible create groups by all unique values - check order before
#print (df.columns.unique())
#d = {v:k for k, v in enumerate(df.columns.unique()}
df['new'] = df['column'].map(d).diff().lt(0).cumsum() + 1

Answer 2

我认为这是最简单的解决方案，因为您只需要定义开始间隔的值/字符串即可：

start_val = 1 # the value / string which starts the cycle
df['new'] = ((df['column'] == start_val) | pd.isna(df['column'].shift())).cumsum()

如果您在df['column']中有nan，则在.fillna(0 / '')之前添加.shift()

如何在python熊猫中标记循环数的值

2 个答案: