Pandas - 根据不匹配的时间序列,使用Df1搜索和修改Df

时间:2015-03-24 10:39:18

标签: python pandas

我正在努力寻找迭代Df的方法,并且对于每一行,应用迭代的定义来搜索Df1中最接近的匹配(目的是将数据从Df1添加到Df)。阅读并尝试了很多这里找到的方法,但没有获胜。我会很感激一些指针,特别是如果我走错了路线:

Df(示例) - 对于每一行,取日期时间(我已将其删除为索引),并在Df1中搜索最接近的时间匹配,然后添加(在本例中为Offshore_Umgeni_Deep at 09:03) Df [新专栏]'。

       datetime                    cond06     temp03     pres07  
0      2015-02-26 09:03:38.833000  49.448935  22.162381  10.909805   
1             2015-02-26 09:03:39  50.098050  22.162781  10.885601   
2      2015-02-26 09:03:39.167000  50.060446  22.164354  10.807413   
3      2015-02-26 09:03:39.333000  50.239644  22.156575  10.788496   
4      2015-02-26 09:03:39.500000  50.179168  22.160942  10.803082   

DF1:

      datetime             S         E         Location_Name
0     2015-02-26 09:01:00  29.81192  31.04692  Offshore_Umgeni_Deep
1     2015-02-26 09:01:00  29.81176  31.04688  Offshore_Umgeni_Deep
2     2015-02-26 09:01:00  29.81159  31.04682  Offshore_Umgeni_Deep
3     2015-02-26 09:02:00  29.81140  31.04676  Offshore_Umgeni_Deep
4     2015-02-26 09:02:00  29.81127  31.04673  Offshore_Umgeni_Deep
5     2015-02-26 09:02:00  29.81116  31.04671  Offshore_Umgeni_Deep
6     2015-02-26 09:02:00  29.81110  31.04670  Offshore_Umgeni_Deep
7     2015-02-26 09:02:00  29.81109  31.04673  Offshore_Umgeni_Deep
8     2015-02-26 09:02:00  29.81107  31.04674  Offshore_Umgeni_Deep
9     2015-02-26 09:02:00  29.81105  31.04673  Offshore_Umgeni_Deep
10    2015-02-26 09:02:00  29.81103  31.04673  Offshore_Umgeni_Deep
11    2015-02-26 09:02:00  29.81103  31.04672  Offshore_Umgeni_Deep
12    2015-02-26 09:02:00  29.81103  31.04669  Offshore_Umgeni_Deep
13    2015-02-26 09:03:00  29.81102  31.04666  Offshore_Umgeni_Deep
14    2015-02-26 09:03:00  29.81103  31.04664  Offshore_Umgeni_Deep
15    2015-02-26 09:03:00  29.81104  31.04663  Offshore_Umgeni_Deep
16    2015-02-26 09:03:00  29.81105  31.04661  Offshore_Umgeni_Deep
17    2015-02-26 09:03:00  29.81106  31.04660  Offshore_Umgeni_Deep
18    2015-02-26 09:03:00  29.81107  31.04657  Offshore_Umgeni_Deep
19    2015-02-26 09:03:00  29.81109  31.04655  Offshore_Umgeni_Deep
20    2015-02-26 09:03:00  29.81110  31.04653  Offshore_Umgeni_Deep
21    2015-02-26 09:03:00  29.81111  31.04650  Offshore_Umgeni_Deep
22    2015-02-26 09:04:00  29.81113  31.04649  Offshore_Umgeni_Deep
23    2015-02-26 09:04:00  29.81114  31.04647  Offshore_Umgeni_Deep
24    2015-02-26 09:04:00  29.81116  31.04646  Offshore_Umgeni_Deep
25    2015-02-26 09:04:00  29.81117  31.04642  Offshore_Umgeni_Deep
26    2015-02-26 09:04:00  29.81118  31.04640  Offshore_Umgeni_Deep

我只能使用HH:MM,但我可能会错过数据,所以我一直在尝试使用最少的错误方法。经过多次尝试后,我认为最好用Df迭代通过Df1迭代Df:

def timesearch(df, df1):
    for datetime, row in df1.datetime.iteritems():
        if abs(df1['datetime'] - df['datetime']) < error:
            return (df['Location_Name'] = df1['Location_Name'])
        else:
            return (df["Location_Name"] = None))

for datetime, row in df.datetime.iteritems():
    def timesearch(df1.datetime,df2.datetime)

我知道上面的内容不太正确,因为在迭代它时需要知道行索引从Df1返回到Df。但希望我能够简明扼要地说出这个想法。

0 个答案:

没有答案