我有一个伪造的数据集,显示了一个区域列表。这些区域包含成员,每个成员都有一个值。
我想为每个区域计算其值满足条件的唯一成员的数量。我设法解决了这个问题,但我想知道在熊猫市中是否有更清洁的方法来解决这个问题。
到目前为止,这是我的尝试:
# Building the fake dataset
dummy_dict = {
"area": ["A","A", "A","A","B","B"],
"member" : ["O1","O2","O2","O3","O1","O1"],
"value" : [90, 200, 200, 150, 120, 120]
}
df = pd.DataFrame(dummy_dict)
# Counting the number of unique members that satisfy the condition by zone
value_cutoff = 100
df["nb_unique_members"] = df.groupby("area")["member"].transform("nunique")
df.loc[df["value"]>=value_cutoff,"tmp"] = df.loc[df["value"]>=value_cutoff].groupby("area")["member"].transform("nunique")
df["nb_unique_members_above_cutoff"] = df.groupby("area")["tmp"].transform("mean")
df.head()
在Pandas中有更好的方法吗?预先感谢!