如何计算熊猫在时间点上的累积groupby计数?

时间:2019-06-18 14:42:36

标签: python pandas dataframe

我有一个df,其中包含JIRA票证的多个每周快照。我想计算票的年初至今数。

df看起来像这样:

pointInTime   ticketId
2008-01-01         111
2008-01-01         222
2008-01-01         333
2008-01-07         444
2008-01-07         555
2008-01-07         666
2008-01-14         777
2008-01-14         888
2008-01-14         999

因此,如果我df.groupby(['pointInTime'])['ticketId'].count(),我可以得到每个快照的Ids数量。但是我要实现的是计算累计和。

拥有一个df,如下所示:

pointInTime   ticketId   cumCount
2008-01-01         111   3
2008-01-01         222   3
2008-01-01         333   3
2008-01-07         444   6
2008-01-07         555   6
2008-01-07         666   6
2008-01-14         777   9
2008-01-14         888   9
2008-01-14         999   9

因此,对于2008-01-07票证数量将是2008-01-07的计数+ 2008-01-01的计数。

3 个答案:

答案 0 :(得分:6)

使用GroupBy.countcumsum,然后将map的结果返回到“ pointInTime”:

@NonNull
    @Override
    public RecyclerView.ViewHolder onCreateViewHolder(@NonNull ViewGroup parent, int viewType) {
        LayoutInflater inflater = LayoutInflater.from(parent.getContext());
        View view;
        switch (viewType) {
            case TYPE1:
                view = inflater.inflate(R.layout.layout_1, parent, false);
                return new ViewHolder1(view);
            case TYPE_DATA:
                view = inflater.inflate(R.layout.layout_2, parent, false);
                return new Viewholder2(view);

            default:
                throw new RuntimeException("No match for found for" + viewType);
        }
    }



@Override
    public void onBindViewHolder(@NonNull RecyclerView.ViewHolder holder, int position) {
        if (holder instanceof Viewholder1) {
            viewholder1 = (Viewholder1) holder;
          }
        if (holder instanceof Viewholder2) {
            viewholder2 = (Viewholder2) holder;
        }
    }


    @Override
            public int getItemViewType(int position) {
                //Based on certain condition, change your return type
               if (type1)
                 return viewType1;
               if(type2)
                 return viewType2;
                 ....        
}

答案 1 :(得分:4)

我正在使用value_counts

df.pointInTime.map(df.pointInTime.value_counts().sort_index().cumsum())
Out[207]: 
0    3
1    3
2    3
3    6
4    6
5    6
6    9
7    9
8    9
Name: pointInTime, dtype: int64

pd.Series(np.arange(len(df))+1,index=df.index).groupby(df['pointInTime']).transform('last')
Out[216]: 
0    3
1    3
2    3
3    6
4    6
5    6
6    9
7    9
8    9
dtype: int32

答案 2 :(得分:3)

这是一种使用size进行转换并乘以pd.factorize上的pointInTime的结果的方法:

df['cumCount'] = (df.groupby('pointInTime').ticketId
                    .transform('size')
                    .mul(pd.factorize(df.pointInTime)[0]+1))

 pointInTime  ticketId  cumCount
0  2008-01-01       111         3
1  2008-01-01       222         3
2  2008-01-01       333         3
3  2008-01-07       444         6
4  2008-01-07       555         6
5  2008-01-07       666         6
6  2008-01-14       777         9
7  2008-01-14       888         9
8  2008-01-14       999         9