I'm trying to add a feature column to my dataframe and match it to my existing dataframe rows by month and year (which I've stored in integer columns).
I've tried using .iloc[]
to specify the row to add the new feature variable df['Price Level']
that is taken from i_df['CPIAUCNS']
, but after reading a lot of Stack Overflow, it seems like np.where
is a more appropriate function for a conditional statement.
bool_filter = ((df['Release Date Year'] == i_df['Release Date Year'])
& (df['Release Date Month'] == i_df['Release Date Month']))
df['Price Level'] = np.where(bool_filter, i_df['CPIAUCNS'])
I was hoping this would generate a new feature column in df
with the value from i_df
where Year and Month were equal, instead I receive:
ValueError: Can only compare identically-labeled Series objects
This error is thrown in the bool_filter
so the np.where
does not get to execute.
Would someone be able to explain why this conditional statement generates this error and how I might be able to rephrase it?
EDIT:
Trying to use .values()
in the boolean filter results in the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-9b470b5aee2c> in <module>()
5 # df[df['Release Date'].isna() == True]
6
----> 7 bool_filter = ((df['Release Date Year'].values() == i_df['Release Date Year'].values())
8 & (df['Release Date Month'].values() == i_df['Release Date Month'].values()))
9
TypeError: 'numpy.ndarray' object is not callable
答案 0 :(得分:0)
SOLUTION #1
You should use df.merge()
df = df.merge(i_df, how='left', left_on=['Release Date Year', 'Release Date Month'],
right_on=['Release Date Year', 'Release Date Month'])
This will join you i_df
df to your df
dataframe. It will do a left
join in this example, but feel free to change the join type.
You will end up with a new df with the column you desire.
SOLUTION #2
Another solution, would be to use your boolean filter to filter your i_df
dataframe
bool_filter = ((df['Release Date Year'] == i_df['Release Date Year'])
& (df['Release Date Month'] == i_df['Release Date Month']))
df['Price Level'] = i_df[bool_filter == True].CPIAUCNS
Now, this will consider that indexes of both dataframe are aligned. Becarefully if you cannot guarantee that both indexes are aligned.
答案 1 :(得分:0)
Based on Teddy's answer I finally succeeded with the following merge statement:
df = df.merge(i_df[['Release Date Year', 'Release Date Month','CPIAUCNS']],
how='left', on=['Release Date Year', 'Release Date Month'])
I ended up with CPIAUCNS
in my dataframe, which was my goal. Thanks Teddy!
However, I still don't understand the problem with my initial bool_filter.