我的数据框有10列
df1: col1, col2, col3, col4, col5, col6, col7, col8, col9, col10
和另一个有5列的数据框
df2: col1, col2, col6, col9, col3
我想将df2
与df1
进行比较,并将不存在的df1
列添加到df2
。
这与Compare Pandas dataframes and add column不重复,我不想添加df1
中的任何值,只想添加空白列。
答案 0 :(得分:1)
dfa = pd.DataFrame({'a':[1,2,3], 'b':[5,6,7]})
dfb = pd.DataFrame({'a':[7,7,7], 'c':[4,4,4], 'e':[0,0,0]})
>>> dfa
a b
0 1 5
1 2 6
2 3 7
>>> dfb
a c e
0 7 4 0
1 7 4 0
2 7 4 0
找到不同的列
>>> col_diff = dfb.columns.difference(dfa.columns)
>>> col_diff
Index(['c', 'e'], dtype='object')
列出新列并添加它们:
>>> new = col_diff.tolist()
>>> new
['c', 'e']
>>>
>>> for col in new:
... dfa[col] = None
>>> dfa
a b c e
0 1 5 None None
1 2 6 None None
2 3 7 None None
>>>
使用DataFrame.assign(相同的初始DataFrames)
>>> # try it when the df indices are different
>>> dfc = dfb.set_index('a')
>>> dfc
c e
a
7 4 0
7 4 0
7 4 0
>>> diff = dfc.columns.difference(dfa.columns)
>>> new = diff.tolist()
>>> new = {col:None for col in new}
>>> dfa = dfa.assign(**new)
>>> dfa
a b c e
0 1 5 None None
1 2 6 None None
2 3 7 None None
答案 1 :(得分:0)
要做到这一点,索引必须匹配。假设他们这样做,尝试类似:
pd.concat([df1.drop(df2.columns, axis=1), df2], axis=1)