I have two DataFrames in Pandas I want to join together (I think merge), and when I do, the resultant DataFrame has all NaN for the right part of the new DataFrame. Here's a simplified schematic:
DF_Left
station_name trips date_zip
0 Mountain View 100 95113 2013-08-29
1 San Francisco 190 95113 2012-04-12
2 San Jose 109 94107 2013-09-01
DF_Right
max_temperature wind_speed date_zip
0 79 2 95113 2013-08-29
1 67 3 95113 2012-04-12
2 64 1 94107 2013-09-01
There's about 40K rows on the left, and 1500 on the right. What I want to do is merge the two so that the DF_Right is added to the DF_Left based on the date_zip column. So what I really want is
DF_Correct
station_name trips date_zip max_temperature wind_speed
0 Mountain View 100 95113 2013-08-29 79 2
1 San Francisco 190 95113 2012-04-12 67 3
2 San Jose 109 94107 2013-09-01 64 1
When I do
DF_Correct = pd.merge(DF_Left, DF_Right, left_on=['date_zip'], right_on = ['date_zip' ], how='left')
I get what I wanted, except all of the weather columns are now NaNs. I'm not sure about the terminology here, so I think merge is what I want, but I'm not sure what's happening to my data.
答案 0 :(得分:0)
Please inspect the data to make sure the data/types are correct. Find below the code, tried with your sample. Test ran well
import pandas as pd
df1 = pd.DataFrame({'station_name': ['Mountain View','San Francisco','San Jose','San Jose'],
'trips': [100,190,109,110],
'date_zip': ['95113 2013-08-29','95113 2012-04-12','94107 2013-09-01','94107 2013-09-02']})
df2 = pd.DataFrame({'wind_speed': [2,3,1],
'max_temperature': [79,67,64],
'date_zip': ['95113 2013-08-29','95113 2012-04-12','94107 2013-09-01']})
DF_Correct = pd.merge(df1, df2, on='date_zip', how='left')
答案 1 :(得分:0)
根据我对问题的了解,下面的代码应提供所需的答案。
DF_Correct = pd.merge(DF_Right, DF_Left , how='left', on='date_zip')