I have an existing DataFrame, and a method that computes a few columns to add to that DataFrame. I currently use pd.concat([left, right], axis=1)
. When I call this method a second time, however, it adds the columns again (with the same name).
With the following sample data frames left
and right
:
left = pd.DataFrame({'one': [1, 2, 3], 'two': [2, 3, 4]})
print(left)
one two
0 1 2
1 2 3
2 3 4
right = pd.DataFrame({'one': [22, 22, 22], 'NEW': [33, 33, 33]})
print(right)
one NEW
0 22 33
1 22 33
2 22 33
I am looking for a foo
method whose result is the following:
left = left.foo(right) # or foo(left, right)
print(left)
one two NEW
0 22 2 33
1 22 3 33
2 22 4 33
And, importantly, if I call left.foo(right)
a second time, I want the result to stay the same.
pd.join
raises an error when a column already exists, pd.concat
doesn't overwrite existing columns, pd.update
only overwrites existing columns but doesn't add new ones.
Is there a function/method to do what I want or do I have to write one myself?
Solution: The solution that worked for me, combined from the two answers below, is:
result = left.\
drop(left.columns.intersection(right.columns), axis=1).\
join(right)
答案 0 :(得分:2)
Take intersection
and drop
columns then merge
on index
:
left = left.drop(left.columns.intersection(right.columns),1).merge(right, left_index=True, right_index=True)
print(left)
two one NEW
0 2 22 33
1 3 22 33
2 4 22 33
答案 1 :(得分:1)
Alternative solution, but it only add new columns, not overwrite:
left = pd.concat([left, right[right.columns.difference(left.columns)]], axis=1)
left = pd.concat([left, right[right.columns.difference(left.columns)]], axis=1)
print (left)
2 22 33
one two NEW
0 1 2 33
1 2 3 33
2 3 4 33