我有一列,迭代1到3。我需要一个循环号,该循环号出现在中间的列中。如何使用熊猫获取第二列编号?
这是表格:
column | I need |Note
-----------------------------------------------------------------------
2 | 1 |first cycle although not starting from 1
3 | 1 |first cycle although not starting from 1
-----------------------------------------------------------------------
1 | 2 |second cycle
2 | 2 |second cycle
3 | 2 |second cycle
-----------------------------------------------------------------------
1 | 3 |
2 | 3 |
3 | 3 |
-----------------------------------------------------------------------
1 | 4 |
2 | 4 |
3 | 4 |
-----------------------------------------------------------------------
1 | 5 |
2 | 5 |
3 | 5 |
-----------------------------------------------------------------------
1 | 6 |
2 | 6 |
3 | 6 |
-----------------------------------------------------------------------
1 | 7 |7th cycle and does have to end in 3
2 | 7 |
答案 0 :(得分:2)
在样本数据的第一个差异为Series.diff
的情况下,比较不像0
,而最后一个累加的总和为Series.cumsum
:
df['new'] = df['column'].diff().lt(0).cumsum() + 1
如果值是字符串,则可以通过Series.map
使用字典将其编码为数字:
df['new'] = df['column'].map({'1':0, '2':2, '3':3}).diff().lt(0).cumsum() + 1
print (df)
column I need new
0 2 1 1
1 3 1 1
2 1 2 2
3 2 2 2
4 3 2 2
5 1 3 3
6 2 3 3
7 3 3 3
8 1 4 4
9 2 4 4
10 3 4 4
11 1 5 5
12 2 5 5
13 3 5 5
14 1 6 6
15 2 6 6
16 3 6 6
17 1 7 7
18 2 7 7
编辑:您可以使用enumerate
通过一组中的所有值来为地图创建字典:
d = {v:k for k, v in enumerate(['1','2','3'])}
#if possible create groups by all unique values - check order before
#print (df.columns.unique())
#d = {v:k for k, v in enumerate(df.columns.unique()}
df['new'] = df['column'].map(d).diff().lt(0).cumsum() + 1
答案 1 :(得分:1)
我认为这是最简单的解决方案,因为您只需要定义开始间隔的值/字符串即可:
start_val = 1 # the value / string which starts the cycle
df['new'] = ((df['column'] == start_val) | pd.isna(df['column'].shift())).cumsum()
如果您在df['column']
中有nan,则在.fillna(0 / '')
之前添加.shift()