我想过滤掉df1 col1中所有唯一值都存在的df1 col1中的唯一值。 解决这个问题的好方法是什么?
示例来说明问题。
d = {
'col1': ['alfa','alfa','beta','beta','beta','charlie','delta','delta','echo','foxtrot','foxtrot'],
'col2': ['sweden','norway','norway','sweden','denmark','norway','sweden','norway','denmark','denmark','norway']
}
df = pd.DataFrame(data=d)
print(df)
col1 col2
alfa sweden
alfa norway
beta norway
beta sweden
beta denmark
charlie norway
delta sweden
delta norway
echo denmark
foxtrot denmark
foxtrot norway
想要的结果: df2
col1 col2
beta [norway, sweden, denmark]
答案 0 :(得分:0)
首先创建一个具有唯一国家名称的set
,然后按col1将col2值分组到一个列表中,然后通过比较集合来应用布尔掩码:
df = pd.DataFrame(data=d)
unique = set(df["col2"].unique())
grouped = df.groupby("col1")["col2"].apply(list)
x = df.groupby("col1")["col2"].apply(set)==unique
print (grouped[x])
#
col1
beta [norway, sweden, denmark]