行计数器

时间:2017-12-01 21:38:07

标签: pandas numpy pandas-groupby

我正在尝试创建一个新变量,该变量计算随着时间的推移看到相同ID的次数。

需要从此数据框传递

   id     clae6  year    quarter        
     1  475230.0  2007          1                   
     1  475230.0  2007          2                     
     1  475230.0  2007          3                     
     1  475230.0  2007          4                    
     1  475230.0  2008          1
     1  475230.0  2008          2         
     2  475230.0  2007          1                    
     2  475230.0  2007          2                    
     2  475230.0  2007          3                  
     2  475230.0  2007          4                   
     2  475230.0  2008          1     
     3  475230.0  2010          1     
     3  475230.0  2010          2     
     3  475230.0  2010          3     
     3  475230.0  2010          4     

到这个

   id     clae6  year    quarter     new_variable      
     1  475230.0  2007          1         1   
     1  475230.0  2007          2         2            
     1  475230.0  2007          3         3            
     1  475230.0  2007          4         4           
     1  475230.0  2008          1         5
     1  475230.0  2008          2         6
     2  475230.0  2007          1         1           
     2  475230.0  2007          2         2           
     2  475230.0  2007          3         3         
     2  475230.0  2007          4         4          
     2  475230.0  2008          1         5
     3  475230.0  2010          1         1
     3  475230.0  2010          2         2
     3  475230.0  2010          3         3
     3  475230.0  2010          4         4 

我正在使用以下代码,但也许还有一个更容易(我操作了很多记录,所以我正在寻找更快的代码):

df['control'] = 1
df['new_variable'] = df.groupby(['id'])['control'].cumsum()

2 个答案:

答案 0 :(得分:2)

您可以使用等级

df['new'] = df.groupby('id').rank(method = 'first').astype(int)

    id  clae6   year    quarter new
0   1   475230.0    2007    1   1
1   1   475230.0    2007    2   2
2   1   475230.0    2007    3   3
3   1   475230.0    2007    4   4
4   1   475230.0    2008    1   5
5   1   475230.0    2008    2   6
6   2   475230.0    2007    1   1
7   2   475230.0    2007    2   2
8   2   475230.0    2007    3   3
9   2   475230.0    2007    4   4
10  2   475230.0    2008    1   5
11  3   475230.0    2010    1   1
12  3   475230.0    2010    2   2
13  3   475230.0    2010    3   3
14  3   475230.0    2010    4   4

答案 1 :(得分:2)

使用cumcount

df.groupby('id').cumcount().add(1)
Out[1574]: 
0     1
1     2
2     3
3     4
4     5
5     6
6     1
7     2
8     3
9     4
10    5
11    1
12    2
13    3
14    4
dtype: int64