Suppose I have the following dataframe (df)
I want to create a column that reports the difference between successive Timestamps for each ID, which is straightforward:
<div id="view_photo">
<div ng-show="!editorOpen">
<div class="photo-preview" id="{{'image-' + results[currentIndex].id}}" ng-style="{'background-image': 'url(' + results[currentIndex].original_image_url + ')'}"></div>
<div class="photo-controls">
<div class="btn btn-default pull-left" style="margin-right:5px;" ng-click="editPhoto(results[currentIndex], '<%= ENV['ADOBE_CLIENT_ID'] %>', currentIndex)">
<i class="fa fa-magic"></i> Edit
</div>
</div>
</div>
</div>
which yields
Finally, I want to create another column df['time_diff'] = df.groupby('ID')['Timestamp'].diff()
that reports the value in hours_diff
in terms of hours, given as a float. Ignoring microsecond precision, I tried
time_diff
as well as
df['hours_diff'] = df.time_diff.map(lambda t: t.days*24.0 + t.seconds/3600.0)
both of which give me
AttributeError: 'numpy.timedelta64' object has no attribute 'days'.
However, if I run the command
df.loc[df.time_diff.notnull()==True,'hours_diff'] = df.loc[df.time_diff.notnull()==True].time_diff.map(lambda t: t.days*24.0 + t.seconds/3600.0)
it tells me that the data types for the values in column print set([type(i) for i in df.time_diff]),
are either time_diff
and pandas.tslib.Timedelta
, neither of which seem to be the pandas.tslib.NaTType
type.
答案 0 :(得分:1)
系列dtype timedelta64
会在迭代它们时产生Timedelta
或NaT
,但.map()
或apply()
等函数会将它们视为{{1} }}
您可以通过timedelta64
来访者
Timedelta
方法
.dt
还是更好:
deltas = pd.date_range('2000-01-01', periods=10).to_series().diff()
deltas
2000-01-01 NaT
2000-01-02 1 days
2000-01-03 1 days
2000-01-04 1 days
2000-01-05 1 days
2000-01-06 1 days
2000-01-07 1 days
2000-01-08 1 days
2000-01-09 1 days
2000-01-10 1 days
Freq: D, dtype: timedelta64[ns]
deltas.dt.days*24.0 + deltas.dt.seconds/3600.0
2000-01-01 NaN
2000-01-02 24
2000-01-03 24
2000-01-04 24
2000-01-05 24
2000-01-06 24
2000-01-07 24
2000-01-08 24
2000-01-09 24
2000-01-10 24
Freq: D, dtype: float64
答案 1 :(得分:1)
您可以将timedelta64除以np.timedelta64(1,'s')以获得以秒为单位的增量。如果你真的想要摆脱微秒精度,只需将其四舍五入为0位并除以3600即可得到以小时为单位的增量。
实际上,只有示例的倒数第二行是相关的,其余的是设置数据帧。 (我改变了第二行,以获得更精确的东西,我可以绕过它。)
import pandas as pd
import numpy as np
data = [{'ID': 'X', 'Timestamp': '2014-12-15 00:00:00', 'Quantity': 4},
{'ID': 'X', 'Timestamp': '2014-12-15 01:25:00.435', 'Quantity': 7},
{'ID': 'X', 'Timestamp': '2014-12-15 02:00:00', 'Quantity': 5},
{'ID': 'X', 'Timestamp': '2014-12-15 03:00:00', 'Quantity': 5},
{'ID': 'X', 'Timestamp': '2014-12-15 04:00:00', 'Quantity': 0},
{'ID': 'Y', 'Timestamp': '2014-12-15 00:00:00', 'Quantity': 9},
{'ID': 'Y', 'Timestamp': '2014-12-15 01:00:00', 'Quantity': 1},
{'ID': 'Y', 'Timestamp': '2014-12-15 02:00:00', 'Quantity': 3},
{'ID': 'Y', 'Timestamp': '2014-12-15 03:00:00', 'Quantity': 2},
{'ID': 'Y', 'Timestamp': '2014-12-15 04:00:00', 'Quantity': 7},
]
df = pd.DataFrame(data)
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['time_diff'] = df.groupby('ID')['Timestamp'].diff()
df['hour_diff'] = (df['time_diff']/np.timedelta64(1, 's')).round(0)/3600
print(df)
输出:
ID Quantity Timestamp time_diff hour_diff 0 X 4 2014-12-15 00:00:00.000 NaT NaN 1 X 7 2014-12-15 01:25:00.435 01:25:00.435000 1.416667 2 X 5 2014-12-15 02:00:00.000 00:34:59.565000 0.583333 3 X 5 2014-12-15 03:00:00.000 01:00:00 1.000000 4 X 0 2014-12-15 04:00:00.000 01:00:00 1.000000 5 Y 9 2014-12-15 00:00:00.000 NaT NaN 6 Y 1 2014-12-15 01:00:00.000 01:00:00 1.000000 7 Y 3 2014-12-15 02:00:00.000 01:00:00 1.000000 8 Y 2 2014-12-15 03:00:00.000 01:00:00 1.000000 9 Y 7 2014-12-15 04:00:00.000 01:00:00 1.000000