熊猫:合并和比较数据框

时间:2020-06-08 06:26:30

标签: python pandas dataframe

我有3个要通过“标签”合并或合并的数据框,然后可以比较所有列

df的示例如下:

df1

Label,col1,col2,col3
NF1,1,1,6
NF2,3,2,8
NF3,4,5,4
NF4,5,7,2
NF5,6,2,2

df2

Label,col1,col1,col3
NF1,8,4,5
NF2,4,7,8
NF3,9,7,8

df3

Label,col1,col1,col3
NF1,2,8,8
NF2,6,2,0
NF3,2,2,5
NF4,2,4,9
NF5,2,5,8

和不悦的事物类似于

Label,df1_col1,df2_col1,df_col1,df1_col2,df2_col2,df3_col2,df1_col3,df2_col3,df_col3
NF1,1,8,2,1,4,8,6,5,8
NF2,3,4,6,2,7,2,8,8,0
NF3,4,9,2,5,7,2,4,8,5
NF4,5,,2,7,,4,2,,9
NF5,6,,2,2,,5,2,,8

但是我对如何使比较更具可读性提出了建议。

谢谢!

3 个答案:

答案 0 :(得分:2)

concat与DataFrames列表一起使用,添加参数keys作为前缀并按列名排序:

dfs = [df1, df2, df3]
k = ('df1','df2','df3')
df = (pd.concat([x.set_index('Label') for x in dfs], axis=1, keys=k)
        .sort_index(axis=1, level=1)
        .rename_axis('Label')
        .reset_index())
df.columns = df.columns.map('_'.join).str.strip('_')
print (df)
  Label  df1_col1  df2_col1  df3_col1  df2_col1.1  df3_col1.1  df1_col2  \
0   NF1         1       8.0         2         4.0           8         1   
1   NF2         3       4.0         6         7.0           2         2   
2   NF3         4       9.0         2         7.0           2         5   
3   NF4         5       NaN         2         NaN           4         7   
4   NF5         6       NaN         2         NaN           5         2   

   df1_col3  df2_col3  df3_col3  
0         6       5.0         8  
1         8       8.0         0  
2         4       8.0         5  
3         2       NaN         9  
4         2       NaN         8  

答案 1 :(得分:2)

您可以使用df.merge

In [1965]: res = df1.merge(df2, on='Label', how='left', suffixes=('_df1', '_df2')).merge(df3, on='Label', how='left').rename(columns={'col1': 'col1_df3','col2':'col2_df3','col3':'col3_df3'})

In [1975]: res = res.reindex(sorted(res.columns), axis=1)

In [1976]: res

Out[1965]: 
  Label  col1_df1  col1_df2  col1_df3  col2_df1  col2_df2  col2_df3  col3_df1  col3_df2  col3_df3
0   NF1         1      8.00         2         1      4.00         8         6      5.00         8
1   NF2         3      4.00         6         2      7.00         2         8      8.00         0
2   NF3         4      9.00         2         5      7.00         2         4      8.00         5
3   NF4         5       nan         2         7       nan         4         2       nan         9
4   NF5         6       nan         2         2       nan         5         2       nan         8

答案 2 :(得分:1)

我们可以通过将Label列设置为索引并加入数据框来使用Pandas的join方法:

dfs = [df1,df2,df3]
keys = ['df1','df2','df3']

#set Label as index
df1, *others = [frame.set_index("Label").add_prefix(f"{prefix}_")
                for frame,prefix in zip(dfs,keys)]

#join df1 with others
outcome = df1.join(others,how='outer').rename_axis(index='Label').reset_index()

outcome


    Label   df1_col1    df1_col2    df1_col3    df2_col1    df2_col2    df2_col3    df3_col1    df3_col2    df3_col3
0   NF1     1           1            6          8.0         4.0          5.0     2  8   8
1   NF2     3           2            8          4.0         7.0          8.0    6   2   0
2   NF3     4           5            4          9.0         7.0          8.0    2   2   5
3   NF4     5           7            2          NaN         NaN          NaN    2   4   9
4   NF5     6           2            2          NaN         NaN          NaN    2   5   8