我有一个像这样的数据框
df[['timestamp_utc','minute_ts','delta']].head()
Out[47]:
timestamp_utc minute_ts delta
0 2015-05-21 14:06:33.414 2015-05-21 12:06:00 -1 days +21:59:26.586000
1 2015-05-21 14:06:33.414 2015-05-21 12:07:00 -1 days +22:00:26.586000
2 2015-05-21 14:06:33.414 2015-05-21 12:08:00 -1 days +22:01:26.586000
3 2015-05-21 14:06:33.414 2015-05-21 12:09:00 -1 days +22:02:26.586000
4 2015-05-21 14:06:33.414 2015-05-21 12:10:00 -1 days +22:03:26.586000
df['delta']=df.minute_ts-df.timestamp_utc
timestamp_utc datetime64[ns]
minute_ts datetime64[ns]
delta timedelta64[ns]
问题是,我想在timestamp_utc
和minutes_ts
之间获得(可能为负)分钟数,而忽略秒组件。
因此,对于第一行,我想获得-120
。实际上,2015-05-21 12:06:00
是2015-05-21 14:06:33.414
之前的120分钟。
做最棒的熊猫方式是什么?
非常感谢!
答案 0 :(得分:3)
您可以使用:
df['a'] = df['delta'] / np.timedelta64(1, 'm')
print (df)
timestamp_utc minute_ts delta \
0 2015-05-21 14:06:33.414 2015-05-21 12:06:00 -1 days +21:59:26.586000
1 2015-05-21 14:06:33.414 2015-05-21 12:07:00 -1 days +22:00:26.586000
2 2015-05-21 14:06:33.414 2015-05-21 12:08:00 -1 days +22:01:26.586000
3 2015-05-21 14:06:33.414 2015-05-21 12:09:00 -1 days +22:02:26.586000
4 2015-05-21 14:06:33.414 2015-05-21 12:10:00 -1 days +22:03:26.586000
a
0 -120.5569
1 -119.5569
2 -118.5569
3 -117.5569
4 -116.5569
然后将float
转换为int
:
df['a'] = (df['delta'] / np.timedelta64(1, 'm')).astype(int)
print (df)
timestamp_utc minute_ts delta a
0 2015-05-21 14:06:33.414 2015-05-21 12:06:00 -1 days +21:59:26.586000 -120
1 2015-05-21 14:06:33.414 2015-05-21 12:07:00 -1 days +22:00:26.586000 -119
2 2015-05-21 14:06:33.414 2015-05-21 12:08:00 -1 days +22:01:26.586000 -118
3 2015-05-21 14:06:33.414 2015-05-21 12:09:00 -1 days +22:02:26.586000 -117
4 2015-05-21 14:06:33.414 2015-05-21 12:10:00 -1 days +22:03:26.586000 -116
答案 1 :(得分:1)
您可以在Pandas中使用Timedelta object,然后在列表推导中使用floor division来计算分钟数。请注意,Timedelta
的秒属性会返回秒数(> = 0且小于1天),因此您必须将天数显式转换为相应的分钟数。
df = pd.DataFrame({'minute_ts': [pd.Timestamp('2015-05-21 12:06:00'),
pd.Timestamp('2015-05-21 12:07:00'),
pd.Timestamp('2015-05-21 12:08:00'),
pd.Timestamp('2015-05-21 12:09:00'),
pd.Timestamp('2015-05-21 12:10:00')],
'timestamp_utc': [pd.Timestamp('2015-05-21 14:06:33.414')] * 5})
df['minutes_neg'] = [td.days * 24 * 60 + td.seconds//60
for td in [pd.Timedelta(delta)
for delta in df.minute_ts - df.timestamp_utc]]
df['minutes_pos'] = [td.days * 24 * 60 + td.seconds//60
for td in [pd.Timedelta(delta)
for delta in df.timestamp_utc - df.minute_ts]]
>>> df
minute_ts timestamp_utc minutes_neg minutes_pos
0 2015-05-21 12:06:00 2015-05-21 14:06:33.414 -121 120
1 2015-05-21 12:07:00 2015-05-21 14:06:33.414 -120 119
2 2015-05-21 12:08:00 2015-05-21 14:06:33.414 -119 118
3 2015-05-21 12:09:00 2015-05-21 14:06:33.414 -118 117
4 2015-05-21 12:10:00 2015-05-21 14:06:33.414 -117 116
请注意,因为楼层划分,会议记录因人而异。例如,90 // 60 = 1,但-90 // 60 = -2。如果结果是负数,你可以在结果中添加一个,但是只有一分钟的边缘情况(以毫秒精度测量)会偏离一分钟。