如何删除与另一列相似的列元素

时间:2019-07-05 09:15:15

标签: python pandas dataframe

我有一个由两列groups_addedgroups_removed组成的数据框。 如果group_added中存在元素,如何从group_removed中删除元素,并同时删除group_removed的元素?

例如:

id      group_added    group_removed
4       "G4,G3"        "G4"
11      "G4,G3"        "G3"
15      "G2,G3"        "G2"
16      "G3"           "G1"
26      "G2"           "G3"
46      "G3"           "G4"
50      "G4,G2"        "G4"

我希望它返回类似:

id      group_added    group_removed
4       "G3"           ""
11      "G4"           ""
15      "G3"           ""
16      "G3"           "G1"
26      "G2"           "G3"
46      "G3"           "G4"
50      "G2"           ""

2 个答案:

答案 0 :(得分:0)

第一步:将字符串转换为列表。

df['group_added'] = df['group_added'].str.split(',')
df['group_removed'] = df['group_removed'].str.split(',')

结果:

   id group_added group_removed
0   4    [G4, G3]          [G4]
1  11    [G4, G3]          [G3]
2  15    [G2, G3]          [G2]
3  16        [G3]          [G1]
4  26        [G2]          [G3]
5  46        [G3]          [G4]
6  50    [G4, G2]          [G4]

第二步:应用所需的转换。

result = df.apply(lambda row: pd.Series({
    'group_added': [g for g in row['group_added'] if g not in row['group_removed']],
    'group_removed': [g for g in row['group_removed'] if g not in row['group_added']]
}), axis=1)

第三步:将列表转换回字符串。

result['group_added'] = result['group_added'].apply(','.join)
result['group_removed'] = result['group_removed'].apply(','.join)

结果:

  group_added group_removed
0          G3              
1          G4              
2          G3              
3          G3            G1
4          G2            G3
5          G3            G4
6          G2              

答案 1 :(得分:0)

df["backup_group_added"] = df["group_added"]

def get_column_difference(row, left_column, right_column):
    difference = (list(set(row[left_column].split(",")) - set(row[right_column].split(","))))
    return ','.join(map(str, difference)) 

df['group_added'] = df.apply(lambda row: get_column_difference(row, 'group_added', 'group_removed'), axis= 1)

df['group_removed'] = df.apply(lambda row: get_column_difference(row, 'group_removed', 'backup_group_added'), axis= 1)

df.drop('backup_group_added', axis=1, inplace=True)