如何获取pandas中按分组分类的行数

时间:2016-08-22 04:28:33

标签: python pandas

pandas,我有(app_categ_eventsdataframe):

> print(app_categ_events.label_id.unique().shape)
> print(app_categ_events.category.unique().shape)

Out:
(492,)
(458,)

我想查看每个label_category多个label_id(因为我认为应该是一对一的映射)。

在R data.table中,我可以这样做:

app_categ_events[, count_rows := .N, by = list(category, label_id)]
# (or smth of that sort...)
print(app_categ_events[counts_rows > 1])

pandas中执行此操作的最佳方式是什么?

3 个答案:

答案 0 :(得分:3)

我们transform数据集在按“类别”,“label_id”

分组后创建“count_rows”列
app_categ_events['count_rows'] = app_categ_events.groupby(['category', 
                  'label_id'])['label_id'].transform('count')
print(app_categ_events)
#  category  label_id  count_rows
#0        a         1           2
#1        a         1           2
#2        b         2           1
#3        b         3           1

现在,OP的帖子中显示的data.table相当于

print(app_categ_events[app_categ_events.count_rows>1])
#    category  label_id  count_rows
#0        a         1           2
#1        a         1           2

数据

import pandas as pd;
app_categ_events = pd.DataFrame({'category': ['a', 'a', 'b', 'b'], 'label_id': [1, 1, 2, 3]})

答案 1 :(得分:2)

您可以使用filtration返回所需的结果。

df = pd.DataFrame({'label_id': [1, 1, 2, 3], 
                   'category': ['a', 'b', 'b', 'c']})

df.groupby(['category']).filter(lambda group: len(group) > 1)
  category  label_id
1        b         1
2        b         2

答案 2 :(得分:1)

<强> 假设:

app_categ_events = pd.DataFrame({'category': ['a', 'a', 'b', 'b'],
                                 'label_id': [1, 1, 2, 3]})

<强> 解决方案:

# identify categories with greater than 1 number of related label_id's
cat_mask = app_categ_events.groupby('category')['label_id'].nunique().gt(1)
cats = cat_mask[cat_mask]

# filter data
app_categ_events[app_categ_events.category.isin(cats.index)]

enter image description here