如何在列之间进行连接以保持2个数据帧中的序列不变

时间:2019-09-06 13:32:03

标签: python pandas

我有2个数据帧,我想按如下方式相互连接:

df1:

index    394               min     FIC-2000      398           min       FFC
0       Recycle Gas        min       20K20       Compressor    min       20k
1       TT                 date       kg/h       AT            date       ..
2       nan              2011-03-02   -20.7                    2011-03-02
                         08:00:00                              08:00:00
3       nan              2011-03-02   -27.5                      ...
                         08:00:10


df2:

index    Unnamed:0    0       1  ..     394         395   .....
 0        Service     Prop   Prop1      Recycle Gas  RecG

输出df3应该像这样:

df3

index    Unnamed:0    0        ..     394                            395..   
0        Service     Prop       Recycle Gas                          RecG
1                               Recycle Gas       min     FIC-2000
2                                                 min       20K20
3                                       TT        date       kg/h
4                                      nan       2011-03-02   -20.7
                                                 08:00:00    
5                                      nan       2011-03-02   -27.5 
                                                 08:00:10

我尝试使用此代码:

df3=pd.concat([df1,df2), axis=1)

,但这只是连续索引394,而df1的其余部分附加到df2数据帧的末尾。 知道怎么做吗?

1 个答案:

答案 0 :(得分:0)

只需更改为axis=0。 考虑一下:

输入:

>>> df
   col1  col2  col3
0     1     4     2
1     2     1     5
2     3     6   319
>>> df_1
   col4  col5  col6
0     1     4    12
1    32    12     3
2     3     2   319
>>> df_2
   col1  col3  col6
0    12    14     2
1     4   132     3
2    23    22     9

Concat不匹配(按列名称)

>>> pd.concat([df, df_1], axis=0)
   col1  col2   col3  col4  col5   col6
0   1.0   4.0    2.0   NaN   NaN    NaN
1   2.0   1.0    5.0   NaN   NaN    NaN
2   3.0   6.0  319.0   NaN   NaN    NaN
0   NaN   NaN    NaN   1.0   4.0   12.0
1   NaN   NaN    NaN  32.0  12.0    3.0
2   NaN   NaN    NaN   3.0   2.0  319.0

匹配匹配:

>>> pd.concat([df, df_1, df_2], axis=0)
   col1  col2   col3  col4  col5   col6
0   1.0   4.0    2.0   NaN   NaN    NaN
1   2.0   1.0    5.0   NaN   NaN    NaN
2   3.0   6.0  319.0   NaN   NaN    NaN
0   NaN   NaN    NaN   1.0   4.0   12.0
1   NaN   NaN    NaN  32.0  12.0    3.0
2   NaN   NaN    NaN   3.0   2.0  319.0
0  12.0   NaN   14.0   NaN   NaN    2.0
1   4.0   NaN  132.0   NaN   NaN    3.0
2  23.0   NaN   22.0   NaN   NaN    9.0

匹配的匹配字,填入NaN-s(从逻辑上讲,您可以填入None-s)

>>> pd.concat([df, df_1, df_2], axis=0).fillna(0) #in case you wish to prettify it, maybe in case of strings do .fillna('')
   col1  col2   col3  col4  col5   col6
0   1.0   4.0    2.0   0.0   0.0    0.0
1   2.0   1.0    5.0   0.0   0.0    0.0
2   3.0   6.0  319.0   0.0   0.0    0.0
0   0.0   0.0    0.0   1.0   4.0   12.0
1   0.0   0.0    0.0  32.0  12.0    3.0
2   0.0   0.0    0.0   3.0   2.0  319.0
0  12.0   0.0   14.0   0.0   0.0    2.0
1   4.0   0.0  132.0   0.0   0.0    3.0
2  23.0   0.0   22.0   0.0   0.0    9.0

编辑 由与以下评论部分中与OP的对话触发。

所以您这样做:

(1)合并数据帧

df3=pd.concat([df1,df2], axis=0)

(2)在其上加入另一个数据框:

df5=pd.merge(df3, df4[["FIC", "min"]], on="FIC", how="outer")

(如果您认为后缀相关,则可能要考虑后缀字段) REF https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html