Merging two DataFrames in Pandas results in NaNs in the new merged DF

时间:2019-04-16 22:40:12

标签: python pandas dataframe

I have two DataFrames in Pandas I want to join together (I think merge), and when I do, the resultant DataFrame has all NaN for the right part of the new DataFrame. Here's a simplified schematic:

DF_Left

     station_name     trips    date_zip
0    Mountain View     100   95113 2013-08-29
1    San Francisco     190   95113 2012-04-12
2    San Jose          109   94107 2013-09-01

DF_Right

      max_temperature     wind_speed   date_zip
0      79                   2       95113 2013-08-29
1      67                   3       95113 2012-04-12
2      64                   1       94107 2013-09-01

There's about 40K rows on the left, and 1500 on the right. What I want to do is merge the two so that the DF_Right is added to the DF_Left based on the date_zip column. So what I really want is

DF_Correct

     station_name     trips    date_zip         max_temperature   wind_speed
0    Mountain View     100   95113 2013-08-29   79                     2                          
1    San Francisco     190   95113 2012-04-12   67                     3                     
2    San Jose          109   94107 2013-09-01   64                     1

When I do

DF_Correct = pd.merge(DF_Left, DF_Right,   left_on=['date_zip'], right_on = ['date_zip' ], how='left')

I get what I wanted, except all of the weather columns are now NaNs. I'm not sure about the terminology here, so I think merge is what I want, but I'm not sure what's happening to my data.

2 个答案:

答案 0 :(得分:0)

Please inspect the data to make sure the data/types are correct. Find below the code, tried with your sample. Test ran well

import pandas as pd
df1 = pd.DataFrame({'station_name': ['Mountain View','San Francisco','San Jose','San Jose'],
                   'trips': [100,190,109,110],
                   'date_zip': ['95113 2013-08-29','95113 2012-04-12','94107 2013-09-01','94107 2013-09-02']})
df2 = pd.DataFrame({'wind_speed': [2,3,1],
                   'max_temperature': [79,67,64],
                   'date_zip': ['95113 2013-08-29','95113 2012-04-12','94107 2013-09-01']})

DF_Correct = pd.merge(df1, df2, on='date_zip', how='left')

答案 1 :(得分:0)

根据我对问题的了解,下面的代码应提供所需的答案。

DF_Correct = pd.merge(DF_Right, DF_Left ,  how='left', on='date_zip')