串联两个数据框,同时在两个数据框之一中保留多余的重复行,但删除其余的行

时间:2020-10-20 10:42:59

标签: python pandas dataframe

我有两个数据框。名称,年龄和兴趣是我的专栏,

df1:
      Name  Age Interest
0  ramesh    1    rugby
1   dhoni    5     coco
2     vir   14  cricket
3     vir   14  cricket
4     vir   14  cricket
5     lee    2  cricket
6     lee    2  cricket

df2:
   Name  Age Interest
0  abd    3     coco
1  vir   14  cricket
2  vir   14  cricket
3  vir   14  cricket
4  vir   14  cricket
5  vir   14  cricket
6  lee    2  cricket

有多个重复项,我想通过串联df1,df2删除重复项来生成另一个数据帧。但是多余的重复记录也应该出现在结果数据框中。如果df1中有3个相同的行,而df2中有5个相同的行,则在结果数据帧中应出现2个重复项。它不应删除所有重复项。

(result_df) 预期的产量

      Name  Age Interest
0  ramesh    1    rugby
1   dhoni    5     coco
2     lee    2  cricket
3     abd    3     coco
4     vir   14  cricket
5     vir   14  cricket

(无需考虑结果输出中出现的重复顺序)

我尝试使用drop_duplicates,但是会删除所有重复的行,而使用“ keep”只能保留第一个或最后一个重复值。该怎么办?

删除所有重复项的示例代码

import pandas as pd 

data1 = [['ramesh', 1 , 'rugby'], ['dhoni', 5, 'coco'], ['vir', 14, 'cricket'],['vir', 14, 'cricket'],['vir', 14 , 'cricket'],['lee',2 ,'cricket'],['lee',2 ,'cricket'] ] 
df1 = pd.DataFrame(data1, columns = ['Name', 'Age' , 'Interest']) 
  
data2 = [['abd', 3, 'coco'], ['vir', 14, 'cricket'],['vir', 14, 'cricket'],['vir', 14 , 'cricket'],['vir', 14 , 'cricket'],['vir', 14 , 'cricket'] , ['lee',2 ,'cricket']]
df2 = pd.DataFrame(data2, columns = ['Name', 'Age' , 'Interest']) 

print(df1)
print(df2)

list_df = [df1,df2]
df_concat = pd.concat(list_df)
result_df = df_concat.drop_duplicates(keep = False)
# having value keep = first/last doesn't help
print(result_df)

0 个答案:

没有答案