我有以下df
,
inv_id cluster_id
793 2
2
789 3
789 3
4
4
我喜欢groupby
cluster_id
并检查每个组有多少个唯一值,
df['same_inv_id'] = df.groupby('cluster_id')['inv_id'].transform('nunique') == 1
但是当某个群集仅包含空/空白same_inv_id = False
时,以及当某个群集包含一个或多个空/空白inv_id
时,我喜欢设置inv_id
,因此结果看起来像,
inv_id cluster_id same_inv_id
793 2 False
2 False
789 3 True
789 3 True
4 False
4 False
答案 0 :(得分:2)
IIUC获得条件,然后transform
+ all
s1=df.inv_id.ne('').groupby(df.cluster_id).transform('all')
s1
Out[432]:
0 False
1 False
2 True
3 True
4 False
5 False
Name: inv_id, dtype: bool
s2=df.groupby('cluster_id')['inv_id'].transform('nunique') == 1
#df['same_inv_id']=s1&s2