将迭代函数应用于pandas DataFrame中的每个组

时间:2016-07-27 03:42:53

标签: python pandas dataframe functional-programming

我有大型pandas DataFrame,格式如下:

        prod_id     timestamp     text
150523  0006641040  9.393408e+08  text_1 
150500  0006641040  9.408096e+08  text_2 
150499  0006641041  1.009325e+09  text_3 
150508  0006641041  1.018397e+09  text_4 
150524  0006641042  1.025482e+09  text_5

DataFrame按prod_id和timestamp排序。我想要做的是根据从最早到最晚的时间戳枚举每个prod_id的计数器。例如,我正在努力实现这样的目标:

        prod_id     timestamp     text    enum  
150523  0006641040  9.393408e+08  text_1  1
150500  0006641040  9.408096e+08  text_2  2 
150499  0006641041  1.009325e+09  text_3  1 
150508  0006641041  1.018397e+09  text_4  2 
150524  0006641042  1.025482e+09  text_5  1

通过遍历每一行并增加计数器,我可以非常轻松地迭代地完成这个操作,但有没有办法以更多功能的编程方式执行此操作?

由于

1 个答案:

答案 0 :(得分:3)

<强>更新

In [324]: df
Out[324]:
        prod_id     timestamp    text
150523  6641040  9.393408e+08  text_1
150500  6641040  9.408096e+08  text_2
150501  6641040  9.408096e+08  text_3
150499  6641041  1.009325e+09  text_3
150508  6641041  1.018397e+09  text_4
150524  6641042  1.025482e+09  text_5

In [325]: df['enum'] = df.groupby(['prod_id'])['timestamp'].cumcount() + 1

In [326]: df
Out[326]:
        prod_id     timestamp    text  enum
150523  6641040  9.393408e+08  text_1     1
150500  6641040  9.408096e+08  text_2     2
150501  6641040  9.408096e+08  text_3     3
150499  6641041  1.009325e+09  text_3     1
150508  6641041  1.018397e+09  text_4     2
150524  6641042  1.025482e+09  text_5     1

OLD回答:

In [314]: df['enum'] = df.groupby(['prod_id'])['timestamp'].rank().astype(int)

In [315]: df
Out[315]:
        prod_id     timestamp    text  enum
150523  6641040  9.393408e+08  text_1     1
150500  6641040  9.408096e+08  text_2     2
150499  6641041  1.009325e+09  text_3     1
150508  6641041  1.018397e+09  text_4     2
150524  6641042  1.025482e+09  text_5     1