替换数据框中一列的值

时间:2019-05-28 07:21:07

标签: python-3.x pandas counter

将熊猫作为pd导入 将numpy导入为np 导入ast

pd.options.display.max_columns = 20

我的数据框列季节如下所示(前20个条目):

      season
0     2006-07
1     2007-08
2     2008-09
3     2009-10
4     2010-11
5     2011-12
6     2012-13
7     2013-14
8     2014-15
9     2015-16
10    2016-17
11    2017-18
12    2018-19
13     Career
14     season
15    2018-19
16     Career
17     season
18    2017-18
19    2018-19

从季节开始,到职业生涯结束。我想用从1开始到职业结束的数字替换年份。我想像这样:

      season
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9     10
10    11
11    12
12    13
13     Career
14     season
15    1
16     Career
17     season
18    1
19    2

因此,每次在列中出现季节时,计数应该重置,并且在每次职业中都结束计数。

1 个答案:

答案 0 :(得分:5)

通过比较Series.isin创建的掩码与计数器的GroupBy.cumcount的偏移值来创建连续组:

s = df['season'].isin(['Career', 'season'])
df['new'] = np.where(s, df['season'], df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1)
print (df)
     season     new
0   2006-07       1
1   2007-08       2
2   2008-09       3
3   2009-10       4
4   2010-11       5
5   2011-12       6
6   2012-13       7
7   2013-14       8
8   2014-15       9
9   2015-16      10
10  2016-17      11
11  2017-18      12
12  2018-19      13
13   Career  Career
14   season  season
15  2018-19       1
16   Career  Career
17   season  season
18  2017-18       1
19  2018-19       2

对于替换列season

s = df['season'].isin(['Career', 'season'])
df.loc[~s, 'season'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount() + 1
print (df)
    season
0        1
1        2
2        3
3        4
4        5
5        6
6        7
7        8
8        9
9       10
10      11
11      12
12      13
13  Career
14  season
15       1
16  Career
17  season
18       1
19       2