每周计数-出现唯一列值,并显示最大计数

时间:2019-02-04 16:57:46

标签: pandas pandas-groupby

我正在尝试创建唯一列值出现次数的每周计数,并找到给定时间段内每次出现的最高每周计数。初始DataFrame的示例如下所示。

date            company

2014-12-01 	bank of america corp
2014-12-01 	bank of america corp
2014-12-01 	jpmorgan chase & co
2014-12-01 	jpmorgan chase & co
2014-12-01 	morgan stanley
2014-12-01      morgan stanley
2014-12-01 	intel corp
2014-12-01 	goldman sachs group inc
2014-12-01 	bank of america corp
2014-12-01 	jpmorgan chase & co
2014-12-02      berkshire hathaway inc
2014-12-02 	berkshire hathaway inc
2014-12-02      berkshire hathaway inc
2014-12-02 	berkshire hathaway inc
2014-12-02 	bank of america corp
2014-12-02 	bank of america corp
2014-12-02 	jpmorgan chase & co
2014-12-02      jpmorgan chase & co
2014-12-02      morgan stanley
2014-12-03 	morgan stanley
2014-12-03 	jpmorgan chase & co
2014-12-03 	bank of america corp
2014-12-03 	morgan stanley
2014-12-03 	goldman sachs group inc
2014-12-03      bank of america corp
2014-12-03 	jpmorgan chase & co
2014-12-03 	goldman sachs group inc
.....           ...........

我正在尝试使用按周分组的DataFrame创建公司计数,并显示每个公司出现的最高周。每家公司应排成一行,包含提及最多的一周。预期的DataFrame的示例如下所示:

date            company                   top_week_count

2014-12-07 	bank of america corp      22
2014-12-07 	jpmorgan chase & co       12
2014-12-14 	morgan stanley            15
2014-12-14 	goldman sachs group inc   29
2014-12-21 	berkshire hathaway inc    35
.....           ....                      ..

以上DataFrames只是整个DF的简短摘要,跨越了数年。

任何人都能提供的帮助将不胜感激!

2 个答案:

答案 0 :(得分:2)

尝试

from pandas.tseries.offsets import *
df['weekend'] = df['date'] + Week(weekday=4)
df.groupby(['weekend', 'company']).size().reset_index(name = 'top_week_count')

    weekend     company                top_week_count
0   2014-12-05  bank of america corp    7
1   2014-12-05  berkshire hathaway inc  4
2   2014-12-05  goldman sachs group inc 3
3   2014-12-05  intel corp              1
4   2014-12-05  jpmorgan chase & co     7
5   2014-12-05  morgan stanley          5

答案 1 :(得分:2)

尝试:

df.groupby([pd.Grouper(freq='W', key='date'),'company'])['company']\
.agg(['count']).reset_index().sort_values('count',ascending=False)
        date                      company  count
0 2014-12-07         bank of america corp      7
5 2014-12-07          jpmorgan chase & co      7
6 2014-12-07               morgan stanley      5
1 2014-12-07       berkshire hathaway inc      4
2 2014-12-07      goldman sachs group inc      2
3 2014-12-07  goldman sachs group inc/the      1
4 2014-12-07                   intel corp      1