我有一个带有两个字段的pd数据框:DBA名称(设施名称)和许可证号。 DBA名称有多个列表,有些具有相同的许可证,而另一些则没有。
我想找出所有DBA名称的实例数。我也想找出它们各自有多少个唯一的许可证号。
我尝试使用value_counts(),但仅适用于pandas df中的一个字段。我也尝试使用apply(),但这没有用。
我在下面显示了示例代码。请给我您的想法。
data = data[['DBA Name','License #']]
data:
DBA Name License #
1 BUSY BUMBLE BEE ACADEMY DAYCARE 2215472.0
2 BUSY BUMBLE BEE ACADEMY DAYCARE 3793.0
3 BUSY BUMBLE BEE ACADEMY DAYCARE 2215472.0
4 BUSY BUMBLE BEE ACADEMY DAYCARE 1194190.0
5 BUSY BUMBLE BEE ACADEMY DAYCARE 2215472.0
6 BUSY BUMBLE BEE ACADEMY DAYCARE 1194190.0
7 BUSY BUMBLE BEE ACADEMY DAYCARE 1194190.0
8 BUSY BUMBLE BEE ACADEMY DAYCARE 3793.0
9 BUSY BUMBLE BEE ACADEMY DAYCARE 3793.0
10 BOTTLES TO BOOKS LEARNING CENTER 1943545.0
11 BOTTLES TO BOOKS LEARNING CENTER 1943545.0
12 BOTTLES TO BOOKS LEARNING CENTER 1926534.0
13 BOTTLES TO BOOKS LEARNING CENTER 1926534.0
14 BOTTLES TO BOOKS LEARNING CENTER 1926534.0
15 BOTTLES TO BOOKS LEARNING CENTER 1943545.0
16 BOTTLES TO BOOKS LEARNING CENTER 1926534.0
17 BOTTLES TO BOOKS LEARNING CENTER 1943545.0
18 A CHILD'S WORLD EARLY LEARNING CENTER 1357825.0
19 A CHILD'S WORLD EARLY LEARNING CENTER 1357825.0
20 A CHILD'S WORLD EARLY LEARNING CENTER 1768092.0
21 A CHILD'S WORLD EARLY LEARNING CENTER 1768092.0
22 A CHILD'S WORLD EARLY LEARNING CENTER 1357825.0
23 A CHILD'S WORLD EARLY LEARNING CENTER 1768092.0
24 A CHILD'S WORLD EARLY LEARNING CENTER 1357825.0
答案 0 :(得分:2)
将pd.DataFrame.groupby
与nunique
和agg
一起使用:
import pandas as pd
df.groupby('DBA Name').agg({'DBA Name': 'count', 'License #': 'nunique'})
输出:
DBA Name License #
DBA Name
A CHILD'S WORLD EARLY LEARNING CENTER 7 2
BOTTLES TO BOOKS LEARNING CENTER 8 2
BUSY BUMBLE BEE ACADEMY DAYCARE 9 3