我有一个带有NAME和DATE的数据框。我想创建一个仅在日期不同时才增加的计数列。请参阅下面的第三列:
Name Date COLUMN I NEED
---- ---- -------------
Bob 11-01-2019 1
Bob 11-01-2019 1
Bob 11-20-2019 2
Mike 12-01-2019 1
Mike 12-02-2019 2
Mike 12-03-2019 3
Steve 01-01-2019 1
Steve 01-01-2019 1
我尝试使用:
df['COLUMN RESULT'] = df.groupby(['Name'])['Date'].cumsum() + 1
Name Date COLUMN RESULT
---- ---- -------------
Bob 11-01-2019 1
Bob 11-01-2019 2
Bob 11-20-2019 3
Mike 12-01-2019 1
Mike 12-02-2019 2
Mike 12-03-2019 3
Steve 01-01-2019 1
Steve 01-01-2019 2
但是无论日期是什么,它都会增加。感谢您的帮助,谢谢!
答案 0 :(得分:4)
使用
df['result'] = df.Date.ne(df.Date.shift()).groupby(df.Name).cumsum().astype(int)
Name Date result
0 Bob 11-01-2019 1
1 Bob 11-01-2019 1
2 Bob 11-20-2019 2
3 Mike 12-01-2019 1
4 Mike 12-02-2019 2
5 Mike 12-03-2019 3
6 Steve 01-01-2019 1
7 Steve 01-01-2019 1
答案 1 :(得分:0)
使用groupby
apply
和shift
df['result'] = df.groupby('Name')['Date'].apply(lambda x : x.ne(x.shift()).cumsum())
print(df)
Name Date given_output result
0 Bob 2019-11-01 1 1
1 Bob 2019-11-01 1 1
2 Bob 2019-11-20 2 2
3 Mike 2019-12-01 1 1
4 Mike 2019-12-02 2 2
5 Mike 2019-12-03 3 3
6 Steve 2019-01-01 1 1
7 Steve 2019-01-01 1 1