选择并合并pandas数据帧(日期)

时间:2016-08-02 15:14:47

标签: python pandas dataframe merge

我有两个数据帧,我需要从第一个数据库中选择数据并与第二个数据库合并。考虑第一个df1:

         ob_time air_temperature
0   2016-02-01 00:00            11.2
4   2016-02-01 01:00            11.1
8   2016-02-01 02:00            11.1
12  2016-02-01 03:00            10.8
16  2016-02-01 04:00            10.6
20  2016-02-01 05:00            10.8
24  2016-02-01 06:00            10.9
28  2016-02-01 07:00            10.7
32  2016-02-01 08:00            10.2
36  2016-02-01 09:00            10.9
44  2016-02-01 10:00              11
48  2016-02-01 11:00            11.5
52  2016-02-01 12:00            11.6
56  2016-02-01 13:00            12.7
60  2016-02-01 14:00            12.9
64  2016-02-01 15:00            12.6
68  2016-02-01 16:00              12
72  2016-02-01 17:00            11.1
76  2016-02-01 18:00            10.7
80  2016-02-01 19:00             9.5
84  2016-02-01 20:00             8.9
88  2016-02-01 21:00               9
92  2016-02-01 22:00             8.5
96  2016-02-01 23:00             8.7

705  2016-02-08 00:00               9
709  2016-02-08 01:00             8.9
713  2016-02-08 02:00             6.3
717  2016-02-08 03:00             6.6
721  2016-02-08 04:00             6.1
725  2016-02-08 05:00             5.3
729  2016-02-08 06:00             5.6
733  2016-02-08 07:00             5.1
737  2016-02-08 08:00             4.8
741  2016-02-08 09:00             6.3
750  2016-02-08 10:00               7
754  2016-02-08 11:00             7.4
758  2016-02-08 12:00             7.5
762  2016-02-08 13:00             7.9
766  2016-02-08 14:00             8.3
770  2016-02-08 15:00             7.5
774  2016-02-08 16:00             8.4
778  2016-02-08 17:00             7.7
782  2016-02-08 18:00             7.7
786  2016-02-08 19:00             7.5
790  2016-02-08 20:00               7
794  2016-02-08 21:00             6.5
798  2016-02-08 22:00               6
802  2016-02-08 23:00             5.6

和第二个df2:

        summary  participant_id           response_date
156741     15.0              27 2016-02-01 11:38:22.816
157436     20.0              27 2016-02-08 13:19:10.496

我需要从第一个df1中选择数据,然后按以下方式放入第二个df2:

        summary  participant_id           response_date           ob_time  air_temperature
156741     15.0              27 2016-02-01 11:38:22.816  2016-02-01 11:00             11.5
157436     20.0              27 2016-02-08 13:19:10.496  2016-02-08 13:00              7.9

这个想法非常简单:根据" response-date"合并两个数据帧。和" ob_time",这样" air_temperature" (和" ob_date")后面跟着" response_date"。

我从matlab切换到pandas,现在正在努力使用pythonian选项。 我相信有非常简单的熊猫功能,可以很容易地做到这一点。任何帮助都将受到高度赞赏。

1 个答案:

答案 0 :(得分:2)

您可以使用merge

#if dtypes is not datetime
df1['ob_time'] = pd.to_datetime(df1.ob_time)
df2['response_date'] = pd.to_datetime(df2.response_date)

#replace minutes, seconds and microseconds to 0
#http://stackoverflow.com/a/28783971/2901002
df2['ob_time'] = df2.response_date.values.astype('<M8[h]')
print (df2)

        summary  participant_id           response_date             ob_time
156741     15.0              27 2016-02-01 11:38:22.816 2016-02-01 11:00:00
157436     20.0              27 2016-02-08 13:19:10.496 2016-02-08 13:00:00

print (pd.merge(df1,df2, on=['ob_time']))
              ob_time  air_temperature  summary  participant_id  \
0 2016-02-01 11:00:00             11.5     15.0              27   
1 2016-02-08 13:00:00              7.9     20.0              27   

            response_date  
0 2016-02-01 11:38:22.816  
1 2016-02-08 13:19:10.496  

替换的旧方法:

df2['ob_time'] = df2.response_date
                    .apply(lambda x: x.replace(minute=0, second=0, microsecond=0))
print (df2)