Question

我的数据框有10列

df1: col1, col2, col3, col4, col5, col6, col7, col8, col9, col10

和另一个有5列的数据框

 df2: col1, col2, col6, col9, col3

我想将df2与df1进行比较，并将不存在的df1列添加到df2。

这与Compare Pandas dataframes and add column不重复，我不想添加df1中的任何值，只想添加空白列。

Answer 1

dfa = pd.DataFrame({'a':[1,2,3], 'b':[5,6,7]})
dfb = pd.DataFrame({'a':[7,7,7], 'c':[4,4,4], 'e':[0,0,0]})

>>> dfa
   a  b
0  1  5
1  2  6
2  3  7
>>> dfb
   a  c  e
0  7  4  0
1  7  4  0
2  7  4  0

找到不同的列

>>> col_diff = dfb.columns.difference(dfa.columns)
>>> col_diff
Index(['c', 'e'], dtype='object')

列出新列并添加它们：

>>> new = col_diff.tolist()
>>> new
['c', 'e']
>>> 
>>> for col in new:
...     dfa[col] = None

>>> dfa
   a  b     c     e
0  1  5  None  None
1  2  6  None  None
2  3  7  None  None
>>>

使用DataFrame.assign（相同的初始DataFrames）

>>> # try it when the df indices are different
>>> dfc = dfb.set_index('a')
>>> dfc
   c  e
a      
7  4  0
7  4  0
7  4  0

>>> diff = dfc.columns.difference(dfa.columns)
>>> new = diff.tolist()
>>> new = {col:None for col in new}
>>> dfa = dfa.assign(**new)

>>> dfa
   a  b     c     e
0  1  5  None  None
1  2  6  None  None
2  3  7  None  None

Answer 2

要做到这一点，索引必须匹配。假设他们这样做，尝试类似：

pd.concat([df1.drop(df2.columns, axis=1), df2], axis=1)

比较2个数据帧并添加差异列，Python 3.6

2 个答案: