Question

I have an existing DataFrame, and a method that computes a few columns to add to that DataFrame. I currently use pd.concat([left, right], axis=1). When I call this method a second time, however, it adds the columns again (with the same name).

With the following sample data frames left and right:

left = pd.DataFrame({'one': [1, 2, 3], 'two': [2, 3, 4]})
print(left)

   one  two
0    1    2
1    2    3
2    3    4

right = pd.DataFrame({'one': [22, 22, 22], 'NEW': [33, 33, 33]})
print(right)

   one  NEW
0   22   33
1   22   33
2   22   33

I am looking for a foo method whose result is the following:

left = left.foo(right)  # or foo(left, right)
print(left)

   one  two  NEW
0   22    2   33
1   22    3   33
2   22    4   33

And, importantly, if I call left.foo(right) a second time, I want the result to stay the same.

pd.join raises an error when a column already exists, pd.concat doesn't overwrite existing columns, pd.update only overwrites existing columns but doesn't add new ones.

Is there a function/method to do what I want or do I have to write one myself?

Solution: The solution that worked for me, combined from the two answers below, is:

result = left.\
        drop(left.columns.intersection(right.columns), axis=1).\
        join(right)

Answer 1

Take intersection and drop columns then merge on index :

left = left.drop(left.columns.intersection(right.columns),1).merge(right, left_index=True, right_index=True)

print(left)
   two  one  NEW
0    2   22   33
1    3   22   33
2    4   22   33

Answer 2

Alternative solution, but it only add new columns, not overwrite:

left = pd.concat([left, right[right.columns.difference(left.columns)]], axis=1)

left = pd.concat([left, right[right.columns.difference(left.columns)]], axis=1)
print (left)
2   22   33
   one  two  NEW
0    1    2   33
1    2    3   33
2    3    4   33

How to add or update columns in a pandas DataFrame?

2 个答案:

How to add *or* update columns in a pandas DataFrame?

2 个答案:

How to add or update columns in a pandas DataFrame?