我有一个df1
:
如图所示,SUBJECT_ID
有很多重复的值。我有一个df2
要合并,但是我想将其合并到唯一的SUBJECT_ID
上。目前,我只知道如何通过以下代码合并到整个SUBJECT_ID
:
df1 = pd.merge(df1,df2[['SUBJECT_ID', 'VALUE']], on='SUBJECT_ID', how='left' )
但这将在每个SUBJECT_ID
上合并。我只需要唯一的SUBJECT_ID
。请帮助我。
答案 0 :(得分:0)
我想您会用merge documentation找到答案。
目前尚不清楚您想要什么,但是以下示例可能包含您要寻找的答案:
import pandas as pd
df1 = pd.read_csv('temp.csv')
display(df1)
SUBJECT_ID = [31, 32, 33]
something_interesting = ['cat', 'dog', 'fish']
df2 = pd.DataFrame(list(zip(SUBJECT_ID, something_interesting)),
columns =['SUBJECT_ID', 'something_interesting'])
display(df2)
df_keep_all = df1.merge(df2, on='SUBJECT_ID', how='outer')
display(df_keep_all)
df_keep_df1 = df1.merge(df2, on='SUBJECT_ID', how='inner')
display(df_keep_df1)
df_thinned = pd.merge(df1.drop_duplicates(), df2, on='SUBJECT_ID', how='inner')
display(df_thinned)
答案 1 :(得分:0)
您可以使用pandas drop功能,使用该功能可以删除一个或多个列的所有重复值。
df2 = df.drop_duplicates(subset=['SUBJECT_ID')