我有以下df
,
id a_id b_id
1 25 50
1 25 50
2 26 51
2 26 51
3 25 52
3 28 52
3 28 52
我有以下代码将a_id
和b_id
分配给-1
,具体取决于id
中每个df
值的每一行数。 }};如果a_id
或b_id
值中的每一个与id
的特定值具有完全相同的行/子df,那么a_id
和b_id
的行得-1;
cluster_ids = df.loc[df['id'] > -1]['id'].unique()
types = ['a_id', 'b_id']
for cluster_id in cluster_ids:
rows = df.loc[df['id'] == cluster_id]
for type in types:
ids = rows[type].values
match_rows = df.loc[df[type] == ids[0]]
if match_rows.equals(rows):
df.loc[match_rows.index, type] = -1
所以结果df看起来像,
id a_id b_id
1 25 -1
1 25 -1
2 -1 -1
2 -1 -1
3 25 -1
3 28 -1
3 28 -1
我想知道是否有更有效的方法来实现它。
答案 0 :(得分:3)
one_value_for_each_id = df.groupby('id').transform(lambda x: len(set(x)) == 1)
a_id b_id
0 True True
1 True True
2 True True
3 True True
4 False True
5 False True
6 False True
one_id_for_each_value = pd.DataFrame({
col: df.groupby(col).id.transform(lambda x: len(set(x)) == 1)
for col in ['a_id', 'b_id']
})
a_id b_id
0 False True
1 False True
2 True True
3 True True
4 False True
5 True True
6 True True
one_to_one_relationship = one_id_for_each_value & one_value_for_each_id
# Set all values that satisfy the one-to-one relationship to `-1`
df.loc[one_to_one_relationship.a_id, 'a_id'] = -1
df.loc[one_to_one_relationship.b_id, 'b_id'] = -1
a_id b_id
0 25 -1
1 25 -1
2 -1 -1
3 -1 -1
4 25 -1
5 28 -1
6 28 -1