在pandas

时间:2015-07-24 08:44:17

标签: python python-2.7 pandas merge gps

我有2个数据框:GPS坐标

               Time             X             Y             Z
2013-06-01 00:00:00  13512.466575 -12220.845913  19279.970720
2013-06-01 00:00:00 -13529.778408 -14013.560399 -18060.112972
2013-06-01 00:00:00  25108.907276   8764.536182   1594.215305
2013-06-01 00:00:00  -8436.586675 -22468.562354 -11354.726511
2013-06-01 00:05:00  13559.288748 -11476.738832  19702.063737
2013-06-01 00:05:00 -13500.120049 -14702.564328 -17548.488127
2013-06-01 00:05:00  25128.357948   8883.802142    664.732379
2013-06-01 00:05:00  -8346.854582 -22878.993160 -10544.640975

和Glonass坐标

               Time                    X                    Y                    Z
2013-06-01 00:00:00   0.248752905273E+05  -0.557450976562E+04  -0.726176757812E+03 
2013-06-01 00:15:00   0.148314306641E+05   0.510153710938E+04   0.201156157227E+05
2013-06-01 00:15:00   0.242346674805E+05  -0.562089208984E+04   0.561714257812E+04  
2013-06-01 00:15:00   0.195601284180E+05  -0.122148081055E+05  -0.108823476562E+05 
2013-06-01 00:15:00   0.336192968750E+04  -0.122589394531E+05  -0.220986958008E+05      

我需要根据时间列合并它们 - 从同一时间获取卫星的坐标(我需要所有GPS坐标和特定时间内的所有Glonass坐标),上面示例的结果应如下所示:

                 Time         X_gps         Y_gps         Z_gps           X_glonass            Y_glonass            Z_glonass 
0 2013-06-01 00:00:00  13512.466575 -12220.845913  19279.970720  0.248752905273E+05  -0.557450976562E+04  -0.726176757812E+03   
1 2013-06-01 00:00:00 -13529.778408 -14013.560399 -18060.112972     
2 2013-06-01 00:00:00  25108.907276   8764.536182   1594.215305    
3 2013-06-01 00:00:00  -8436.586675 -22468.562354 -11354.726511        

我最终做的是coord = pd.merge(d_gps, d_glonass, on = 'Time', how = 'inner', suffixes = ('_gps','_glonass'))但它复制了glonass坐标以实现数据框中的空白空间。我应该改变什么来获得我想要的结果? 我是熊猫的新手,所以我真的需要你的帮助。

1 个答案:

答案 0 :(得分:1)

合并后(我冒昧首先重命名列),然后您可以遍历列,测试duplicated并将其设置为NaN,您无法设置为为空,因为列dtype是一个浮点数,设置为空字符串将引发无效的文字错误:

In [272]:
df1 = df1.rename(columns={'X':'X_glonass', 'Y':'Y_glonass', 'Z':'Z_glonass'})
df = df.rename(columns={'X':'X_gps', 'Y':'Y_gps', 'Z':'Z_gps'})
merged = df.merge(df1, on='Time')

In [278]:
for col in merged.columns[1:]:
    merged.loc[merged[col].duplicated(),col] = np.NaN
merged

Out[278]:
        Time         X_gps         Y_gps         Z_gps     X_glonass  \
0 2013-06-01  13512.466575 -12220.845913  19279.970720  24875.290527   
1 2013-06-01 -13529.778408 -14013.560399 -18060.112972           NaN   
2 2013-06-01  25108.907276   8764.536182   1594.215305           NaN   
3 2013-06-01  -8436.586675 -22468.562354 -11354.726511           NaN   

     Y_glonass   Z_glonass  
0 -5574.509766 -726.176758  
1          NaN         NaN  
2          NaN         NaN  
3          NaN         NaN