如何将cumcount应用于两列?

时间:2019-10-18 18:05:09

标签: python pandas

我有一个带有NAME和DATE的数据框。我想创建一个仅在日期不同时才增加的计数列。请参阅下面的第三列:

Name    Date          COLUMN I NEED
----    ----          -------------
Bob     11-01-2019          1
Bob     11-01-2019          1
Bob     11-20-2019          2
Mike    12-01-2019          1
Mike    12-02-2019          2
Mike    12-03-2019          3
Steve   01-01-2019          1
Steve   01-01-2019          1

我尝试使用:

df['COLUMN RESULT'] = df.groupby(['Name'])['Date'].cumsum() + 1

Name    Date          COLUMN RESULT
----    ----          -------------
Bob     11-01-2019          1
Bob     11-01-2019          2
Bob     11-20-2019          3
Mike    12-01-2019          1
Mike    12-02-2019          2
Mike    12-03-2019          3
Steve   01-01-2019          1
Steve   01-01-2019          2

但是无论日期是什么,它都会增加。感谢您的帮助,谢谢!

2 个答案:

答案 0 :(得分:4)

使用

df['result'] = df.Date.ne(df.Date.shift()).groupby(df.Name).cumsum().astype(int)

    Name        Date  result
0    Bob  11-01-2019       1
1    Bob  11-01-2019       1
2    Bob  11-20-2019       2
3   Mike  12-01-2019       1
4   Mike  12-02-2019       2
5   Mike  12-03-2019       3
6  Steve  01-01-2019       1
7  Steve  01-01-2019       1

答案 1 :(得分:0)

使用groupby applyshift

df['result'] = df.groupby('Name')['Date'].apply(lambda x : x.ne(x.shift()).cumsum())
print(df)
     Name     Date  given_output result
0    Bob 2019-11-01      1       1
1    Bob 2019-11-01      1       1
2    Bob 2019-11-20      2       2
3   Mike 2019-12-01      1       1
4   Mike 2019-12-02      2       2
5   Mike 2019-12-03      3       3
6  Steve 2019-01-01      1       1
7  Steve 2019-01-01      1       1