如果不同行之间的(x1,x2,x3)相同,我想删除数据帧中的行,并将删除的行的所有ID保存在变量中。
例如,使用此数据,我要删除第二行;
d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)
答案 0 :(得分:0)
#input data
d = {'id': ["i1", "i2", "i3", "i4"], 'x1': [13, 13, 61, 61], 'x2': [10, 10, 13, 13], 'x3': [12, 12, 2, 22], 'x4': [24, 24,9, 12]}
df = pd.DataFrame(data=d)
#create new column where contents from x1, x2 and x3 columns are merged
df['MergedColumn'] = df[df.columns[1:4]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)
#remove duplicates based on the created column and drop created column
df1 = pd.DataFrame(df.drop_duplicates("MergedColumn", keep='first').drop(columns="MergedColumn"))
#print output dataframe
print(df1)
#merge two dataframes
df2 = pd.merge(df, df1, how='left', on = 'id')
#find rows with null values in the right table (rows that were removed)
df2 = df2[df2['x1_y'].isnull()]
#prints ids of rows that were removed
print(df2['id'])