我有一个包含州代码及其状态的数据集。
code status
1 AZ a
2 CA b
3 KS c
4 MO c
5 NY d
6 AZ d
7 MO a
8 MO b
9 MN b
10 NV a
11 NV e
12 MO f
13 NY a
14 NY a
15 NY b
我想过滤掉这个代码只包含a
状态的数据集,并计算它们的数量。示例输出将是,
code status
1 AZ a
2 MO a
3 NY a
AZ =1 MO = 1 NY =2
我使用df.groupyby("code").loc[df.status == 'a']
但没有运气。
任何帮助表示赞赏!
答案 0 :(得分:2)
让我们首先过滤数据帧a,然后是groupby和count。
df[df.status == 'a'].groupby('code').size()
输出:
code
AZ 1
MO 1
NV 1
NY 2
dtype: int64
答案 1 :(得分:0)
我重新创建了数据集
data = [["AZ","CA", "KS","MO","NY","AZ","MO","MO","MN","NV","NV","MO","NY","NY" ,"NY"],
["a","b","c","c","d","d","a","b","b","a","e","f","a","a","b"]]
df = pd.DataFrame(data)
df = df.T
df.columns = ["code","status" ]
df[df["status"] == "a"].groupby(["code", "status"]).size()
给出
code status
AZ a 1
MO a 1
NV a 1
NY a 2
dtype: int64