Question

我有一个数据框，如下（尾部10）：

> store_id  period_id   sales_volume    t   unique_period   q4
> 809838    38126884    242 1.3485     1.0   211447000      1
> 643854    38126899    240 2.9500    777.0  211448500      1
> 227299    38126899    242 6.2000    777.0  211450000      1
> 731859    38126908    240 2.2000    777.0  211451500      1
> 687269    38126908    241 2.6000     1.0   211451500      1
> 512944    38126926    241 3.9500    777.0  211453000      1
> 832513    38126935    240 0.9500    777.0  211454500      1
> 417892    38126935    242 4.6500    777.0  211456000      1
> 354468    38126938    241 5.1000    777.0  211457500      1
> 604276    38126938    242 3.2765     1.0   211457500      1

我创建了一个groupby对象：

mc[['store_id', 'unique_period']].groupby(['store_id','unique_period']).count()

结果为：

store_id    unique_period
4168621     1000
            2500
            4000
            5500
            7000
            8500
4168624     10000
4168636     11500
            13000
            14500

但是我需要在每个组中计算“唯一时间”，例如：

第4168621组-6条记录， 4168624组-1条记录等等。

之后，需要对该系列计算MEDIAN。

我被困住了。由于GROUPBY没有值，只是GroupBY索引。

Answer 1

将DataFrameGroupBy.nunique与median一起使用：

a = mc.groupby('store_id')['unique_period'].nunique()
print (a)
store_id
38126884    1
38126899    2
38126908    1
38126926    1
38126935    2
38126938    1
Name: unique_period, dtype: int64

a = mc.groupby('store_id')['unique_period'].nunique().median()
print (a)
1.0

编辑：

如果需要计算唯一值和中位数：

a = mc.groupby('store_id')['unique_period'].agg(['nunique','median'])
print (a)
          nunique     median
store_id                    
38126884        1  211447000
38126899        2  211449250
38126908        1  211451500
38126926        1  211453000
38126935        2  211455250
38126938        1  211457500

熊猫Groupby计数

1 个答案: