我有两个数据框:一个只有公司名称和日期。其他只有时间戳。如下所示
creationdate
0 2012-05-01 18:20:27.167000
1 2012-05-01 19:16:08.070000
2 2012-05-01 19:20:07.880000
3 2012-05-01 19:33:02.200000
4 2012-05-01 19:35:09.173000
5 2012-05-01 20:18:55.610000
6 2012-05-01 20:26:27.577000
7 2012-05-01 20:32:34.343000
8 2012-05-01 20:39:31.257000
9 2012-05-01 21:04:50.357000
10 2012-05-01 21:54:18.983000
11 2012-05-02 02:23:53.250000
12 2012-05-02 02:40:27.643000
13 2012-05-02 08:44:28.260000
并且
sitename date
0 Google 2012-05-01
1 Google 2012-05-02
2 Google 2012-05-03
3 Google 2012-05-04
4 Google 2012-05-05
5 Google 2012-05-06
6 Google 2012-05-07
7 Google 2012-05-08
8 Google 2012-05-09
9 Google 2012-05-10
如何有效地遍历第二个数据帧并从第二个数据帧中每个日期对应的第一个数据帧中提取时间戳。
答案 0 :(得分:2)
合并(内部联接)这两个数据框应该有效:
In [96]: df1['date'] = pd.DatetimeIndex (df1.creationdate).date
In [97]: df2['date'] = pd.DatetimeIndex (df2.date).date
In [98]: df=df1.merge(df2, on='date', how='inner')
In [99]: df
Out[99]:
creationdate date sitename
0 2012-05-01 18:20:27.167000 2012-05-01 Google
1 2012-05-01 19:16:08.070000 2012-05-01 Google
2 2012-05-01 19:20:07.880000 2012-05-01 Google
3 2012-05-01 19:33:02.200000 2012-05-01 Google
4 2012-05-01 19:35:09.173000 2012-05-01 Google
5 2012-05-01 20:18:55.610000 2012-05-01 Google
6 2012-05-01 20:26:27.577000 2012-05-01 Google
7 2012-05-01 20:32:34.343000 2012-05-01 Google
8 2012-05-01 20:39:31.257000 2012-05-01 Google
9 2012-05-01 21:04:50.357000 2012-05-01 Google
10 2012-05-01 21:54:18.983000 2012-05-01 Google
11 2012-05-02 02:23:53.250000 2012-05-02 Google
12 2012-05-02 02:40:27.643000 2012-05-02 Google
13 2012-05-02 08:44:28.260000 2012-05-02 Google
然后你可以对df
喜欢
In [100]: df['time_diff'] = df.creationdate.diff()
In [101]: df.time_diff
Out[101]:
0 NaT
1 00:55:40.903000
2 00:03:59.810000
3 00:12:54.320000
4 00:02:06.973000
5 00:43:46.437000
6 00:07:31.967000
7 00:06:06.766000
8 00:06:56.914000
9 00:25:19.100000
10 00:49:28.626000
11 04:29:34.267000
12 00:16:34.393000
13 06:04:00.617000
Name: time_diff, dtype: timedelta64[ns]
当然,您的creationdate
需要datetime64[ns]
NOT STRING。或者您需要转换pd.DatetimeIndex (df.creationdate)