在python中合并数据帧时重复的行

时间:2016-08-18 13:32:01

标签: python python-2.7 python-3.x pandas merge

我目前正在使用外部联接合并2个数据帧,但在合并之后,即使我执行合并的列包含相同的值,我也看到所有行都是重复的。详细说明:

list_1 = pd.read_csv('list_1.csv')
list_2 = pd.read_csv('list_2.csv')

merged_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner')

以下输入和结果:

LIST_1:

email_address, name, surname
john.smith@email.com, john, smith
john.smith@email.com, john, smith
elvis@email.com, elvis, presley

list_2:

email_address, street, city
john.smith@email.com, street1, NY
john.smith@email.com, street1, NY
elvis@email.com, street2, LA

merged_list:

email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA
elvis@email.com, elvis, presley, street2, LA

我的问题是,它不应该是这样的吗?

merged_list(我希望如何:D):

email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA

我怎样才能让它变成这样? 非常感谢你的帮助!

1 个答案:

答案 0 :(得分:9)

list_2_nodups = list_2.drop_duplicates()
pd.merge(list_1 , list_2_nodups , on=['email_address'])

enter image description here

需要重复的行。 list_1中的每个约翰史密斯都与list_2中的每个约翰史密斯匹配。我不得不将重复项删除到其中一个列表中。我选择了list_2