Why does my boolean conditional generate a 'ValueError: Can only compare identically-labeled Series objects'?

时间:2019-04-16 23:22:14

标签: python pandas numpy

I'm trying to add a feature column to my dataframe and match it to my existing dataframe rows by month and year (which I've stored in integer columns).

I've tried using .iloc[] to specify the row to add the new feature variable df['Price Level'] that is taken from i_df['CPIAUCNS'], but after reading a lot of Stack Overflow, it seems like np.where is a more appropriate function for a conditional statement.

bool_filter = ((df['Release Date Year'] == i_df['Release Date Year']) 
               & (df['Release Date Month'] == i_df['Release Date Month']))

df['Price Level'] = np.where(bool_filter, i_df['CPIAUCNS'])

I was hoping this would generate a new feature column in df with the value from i_df where Year and Month were equal, instead I receive:

ValueError: Can only compare identically-labeled Series objects

This error is thrown in the bool_filter so the np.where does not get to execute.

Would someone be able to explain why this conditional statement generates this error and how I might be able to rephrase it?

EDIT:

Trying to use .values() in the boolean filter results in the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-9b470b5aee2c> in <module>()
      5 # df[df['Release Date'].isna() == True]
      6 
----> 7 bool_filter = ((df['Release Date Year'].values() == i_df['Release Date Year'].values()) 
      8                & (df['Release Date Month'].values() == i_df['Release Date Month'].values()))
      9 

TypeError: 'numpy.ndarray' object is not callable

2 个答案:

答案 0 :(得分:0)

SOLUTION #1

You should use df.merge()

df = df.merge(i_df, how='left', left_on=['Release Date Year', 'Release Date Month'],
right_on=['Release Date Year', 'Release Date Month'])

This will join you i_df df to your df dataframe. It will do a left join in this example, but feel free to change the join type.

You will end up with a new df with the column you desire.


SOLUTION #2

Another solution, would be to use your boolean filter to filter your i_df dataframe

bool_filter = ((df['Release Date Year'] == i_df['Release Date Year']) 
               & (df['Release Date Month'] == i_df['Release Date Month']))

df['Price Level'] = i_df[bool_filter == True].CPIAUCNS

Now, this will consider that indexes of both dataframe are aligned. Becarefully if you cannot guarantee that both indexes are aligned.

答案 1 :(得分:0)

Based on Teddy's answer I finally succeeded with the following merge statement:

df = df.merge(i_df[['Release Date Year', 'Release Date Month','CPIAUCNS']], 
              how='left', on=['Release Date Year', 'Release Date Month'])

I ended up with CPIAUCNS in my dataframe, which was my goal. Thanks Teddy!

However, I still don't understand the problem with my initial bool_filter.