Question

我找不到合适的方法来仅连接 colA 的新值。这很简单，我需要将 A 列的新元素从 DF2 添加到 DF1

DF1
colA  colB  colC
 a      5     7
 b      4     5
 c      5     6

DF2
colA  colE  colF
 a      7     e
 b      d     4
 c      f     g
 d      h     h
 e      4     r

我尝试过这样的简单代码，但输出数据帧不正确：

DF3 = pd.concat([DF1, DF2['ColA']], keys=["ColA"])
DF3.drop_duplicates(subset=['ColA'], inplace=True, keep='last')

结果是 [a, 5, 7] 被去掉，替换为 [a, nan, nan]

我需要的是这个：

DF3 merged colA
colA  colB  colC
 a      5     7
 b      4     5
 c      5     6
 d
 e

然后我手动填充 DF3 缺失值。在 DF3 中我不需要 colE 和 colF。

Answer 1

您可以使用pandas.DataFrame.merge：

>>> DF1.merge(DF2, how='outer', on='colA').reindex(DF1.columns, axis=1)
  colA  colB  colC
0    a   5.0   7.0
1    b   4.0   5.0
2    c   5.0   6.0
3    d   NaN   NaN
4    e   NaN   NaN

编辑要删除 NaN 并将其他值转换回 int，您可以尝试：

>>> df.merge(df2['colA'], how='outer').fillna(-1, downcast='infer').replace({-1:''})
  colA colB colC
0    a    5    7
1    b    4    5
2    c    5    6
3    d          
4    e          

# if -1 part is a concern, then, convert to "Int64"
>>> df.astype({'colB': 'Int64', 'colC': 'Int64'}).merge(df2['colA'], how='outer')
  colA  colB  colC
0    a     5     7
1    b     4     5
2    c     5     6
3    d  <NA>  <NA>
4    e  <NA>  <NA>

# You can replace the NaN's with string as well:
>>> df.astype({
      'colB': 'Int64', 
      'colC': 'Int64'
    }).merge(df2['colA'], how='outer').replace({np.nan: ''})

  colA colB colC
0    a    5    7
1    b    4    5
2    c    5    6
3    d          
4    e

Answer 2

删除默认值 keep='last' 的 keep='first'：

DF3.drop_duplicates(subset=['ColA'], inplace=True, keep='last')

到：

DF3.drop_duplicates(subset=['ColA'], inplace=True)

Answer 3

或者只是外部合并 DF2[['colA']]

DF1.merge(DF2[['colA']], how='outer')

仅将一个数据帧第一列的新值连接到另一个数据帧

3 个答案: