我有一个由两列groups_added
和groups_removed
组成的数据框。
如果group_added
中存在元素,如何从group_removed
中删除元素,并同时删除group_removed
的元素?
例如:
id group_added group_removed
4 "G4,G3" "G4"
11 "G4,G3" "G3"
15 "G2,G3" "G2"
16 "G3" "G1"
26 "G2" "G3"
46 "G3" "G4"
50 "G4,G2" "G4"
我希望它返回类似:
id group_added group_removed
4 "G3" ""
11 "G4" ""
15 "G3" ""
16 "G3" "G1"
26 "G2" "G3"
46 "G3" "G4"
50 "G2" ""
答案 0 :(得分:0)
第一步:将字符串转换为列表。
df['group_added'] = df['group_added'].str.split(',')
df['group_removed'] = df['group_removed'].str.split(',')
结果:
id group_added group_removed
0 4 [G4, G3] [G4]
1 11 [G4, G3] [G3]
2 15 [G2, G3] [G2]
3 16 [G3] [G1]
4 26 [G2] [G3]
5 46 [G3] [G4]
6 50 [G4, G2] [G4]
第二步:应用所需的转换。
result = df.apply(lambda row: pd.Series({
'group_added': [g for g in row['group_added'] if g not in row['group_removed']],
'group_removed': [g for g in row['group_removed'] if g not in row['group_added']]
}), axis=1)
第三步:将列表转换回字符串。
result['group_added'] = result['group_added'].apply(','.join)
result['group_removed'] = result['group_removed'].apply(','.join)
结果:
group_added group_removed
0 G3
1 G4
2 G3
3 G3 G1
4 G2 G3
5 G3 G4
6 G2
答案 1 :(得分:0)
df["backup_group_added"] = df["group_added"]
def get_column_difference(row, left_column, right_column):
difference = (list(set(row[left_column].split(",")) - set(row[right_column].split(","))))
return ','.join(map(str, difference))
df['group_added'] = df.apply(lambda row: get_column_difference(row, 'group_added', 'group_removed'), axis= 1)
df['group_removed'] = df.apply(lambda row: get_column_difference(row, 'group_removed', 'backup_group_added'), axis= 1)
df.drop('backup_group_added', axis=1, inplace=True)