根据条件熊猫删除重复的行

时间:2020-03-12 16:41:10

标签: python-3.x pandas dataframe

如果不同行之间的(x1,x2,x3)相同,我想删除数据帧中的行,并将删除的行的所有ID保存在变量中。

例如,使用此数据,我要删除第二行;

d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)

1 个答案:

答案 0 :(得分:0)

#input data
d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)

#create new column where contents from x1, x2 and x3 columns are merged
df['MergedColumn'] = df[df.columns[1:4]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)

#remove duplicates based on the created column and drop created column
df1 = pd.DataFrame(df.drop_duplicates("MergedColumn", keep='first').drop(columns="MergedColumn"))

#print output dataframe
print(df1)

#merge two dataframes
df2 = pd.merge(df, df1,  how='left', on = 'id')
#find rows with null values in the right table (rows that were removed)
df2 = df2[df2['x1_y'].isnull()]

#prints ids of rows that were removed
print(df2['id'])