在列中的唯一值上合并熊猫数据框

时间:2020-07-16 02:07:30

标签: pandas

我有一个df1

enter image description here

如图所示,SUBJECT_ID有很多重复的值。我有一个df2要合并,但是我想将其合并到唯一的SUBJECT_ID上。目前,我只知道如何通过以下代码合并到整个SUBJECT_ID

df1 = pd.merge(df1,df2[['SUBJECT_ID', 'VALUE']], on='SUBJECT_ID', how='left' )

但这将在每个SUBJECT_ID上合并。我只需要唯一的SUBJECT_ID。请帮助我。

2 个答案:

答案 0 :(得分:0)

我想您会用merge documentation找到答案。

目前尚不清楚您想要什么,但是以下示例可能包含您要寻找的答案:

import pandas as pd
df1 = pd.read_csv('temp.csv')
display(df1)

SUBJECT_ID = [31, 32, 33]
something_interesting = ['cat', 'dog', 'fish']
df2 = pd.DataFrame(list(zip(SUBJECT_ID, something_interesting)), 
                   columns =['SUBJECT_ID', 'something_interesting']) 
display(df2)

enter image description here

df_keep_all = df1.merge(df2, on='SUBJECT_ID', how='outer')
display(df_keep_all)

enter image description here

df_keep_df1 = df1.merge(df2, on='SUBJECT_ID', how='inner')
display(df_keep_df1)

enter image description here

df_thinned = pd.merge(df1.drop_duplicates(), df2, on='SUBJECT_ID', how='inner')
display(df_thinned)

enter image description here

答案 1 :(得分:0)

您可以使用pandas drop功能,使用该功能可以删除一个或多个列的所有重复值。

df2 = df.drop_duplicates(subset=['SUBJECT_ID')