pandas map and timedelta with missing values

时间:2016-02-12 21:57:31

标签: pandas timedelta

Suppose I have the following dataframe (df)

dataframe without time_diff

I want to create a column that reports the difference between successive Timestamps for each ID, which is straightforward:

<div id="view_photo">
    <div ng-show="!editorOpen">
        <div class="photo-preview" id="{{'image-' + results[currentIndex].id}}" ng-style="{'background-image': 'url(' + results[currentIndex].original_image_url + ')'}"></div>
        <div class="photo-controls">
            <div class="btn btn-default pull-left" style="margin-right:5px;" ng-click="editPhoto(results[currentIndex], '<%= ENV['ADOBE_CLIENT_ID'] %>', currentIndex)">
                <i class="fa fa-magic"></i> Edit
            </div>
        </div>
    </div>
</div>

which yields

dataframe with time_diff

Finally, I want to create another column df['time_diff'] = df.groupby('ID')['Timestamp'].diff() that reports the value in hours_diff in terms of hours, given as a float. Ignoring microsecond precision, I tried

time_diff

as well as

df['hours_diff'] = df.time_diff.map(lambda t: t.days*24.0 + t.seconds/3600.0)

both of which give me

AttributeError: 'numpy.timedelta64' object has no attribute 'days'.

However, if I run the command

df.loc[df.time_diff.notnull()==True,'hours_diff'] = df.loc[df.time_diff.notnull()==True].time_diff.map(lambda t: t.days*24.0 + t.seconds/3600.0)

it tells me that the data types for the values in column print set([type(i) for i in df.time_diff]), are either time_diff and pandas.tslib.Timedelta, neither of which seem to be the pandas.tslib.NaTType type.

2 个答案:

答案 0 :(得分:1)

系列dtype timedelta64会在迭代它们时产生TimedeltaNaT,但.map()apply()等函数会将它们视为{{1} }}

您可以通过timedelta64来访者

在系列中使用Timedelta方法
.dt

还是更好:

deltas = pd.date_range('2000-01-01', periods=10).to_series().diff()
deltas

2000-01-01      NaT
2000-01-02   1 days
2000-01-03   1 days
2000-01-04   1 days
2000-01-05   1 days
2000-01-06   1 days
2000-01-07   1 days
2000-01-08   1 days
2000-01-09   1 days
2000-01-10   1 days
Freq: D, dtype: timedelta64[ns]

deltas.dt.days*24.0 + deltas.dt.seconds/3600.0

2000-01-01   NaN
2000-01-02    24
2000-01-03    24
2000-01-04    24
2000-01-05    24
2000-01-06    24
2000-01-07    24
2000-01-08    24
2000-01-09    24
2000-01-10    24
Freq: D, dtype: float64

答案 1 :(得分:1)

您可以将timedelta64除以np.timedelta64(1,'s')以获得以秒为单位的增量。如果你真的想要摆脱微秒精度,只需将其四舍五入为0位并除以3600即可得到以小时为单位的增量。

实际上,只有示例的倒数第二行是相关的,其余的是设置数据帧。 (我改变了第二行,以获得更精确的东西,我可以绕过它。)

import pandas as pd
import numpy as np

data = [{'ID': 'X', 'Timestamp': '2014-12-15 00:00:00', 'Quantity': 4},
        {'ID': 'X', 'Timestamp': '2014-12-15 01:25:00.435', 'Quantity': 7},
        {'ID': 'X', 'Timestamp': '2014-12-15 02:00:00', 'Quantity': 5},
        {'ID': 'X', 'Timestamp': '2014-12-15 03:00:00', 'Quantity': 5},
        {'ID': 'X', 'Timestamp': '2014-12-15 04:00:00', 'Quantity': 0},
        {'ID': 'Y', 'Timestamp': '2014-12-15 00:00:00', 'Quantity': 9},
        {'ID': 'Y', 'Timestamp': '2014-12-15 01:00:00', 'Quantity': 1},
        {'ID': 'Y', 'Timestamp': '2014-12-15 02:00:00', 'Quantity': 3},
        {'ID': 'Y', 'Timestamp': '2014-12-15 03:00:00', 'Quantity': 2},
        {'ID': 'Y', 'Timestamp': '2014-12-15 04:00:00', 'Quantity': 7},
       ]

df = pd.DataFrame(data)
df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df['time_diff'] = df.groupby('ID')['Timestamp'].diff()
df['hour_diff'] = (df['time_diff']/np.timedelta64(1, 's')).round(0)/3600

print(df)

输出:

          ID  Quantity               Timestamp       time_diff  hour_diff
        0  X         4 2014-12-15 00:00:00.000             NaT        NaN
        1  X         7 2014-12-15 01:25:00.435 01:25:00.435000   1.416667
        2  X         5 2014-12-15 02:00:00.000 00:34:59.565000   0.583333
        3  X         5 2014-12-15 03:00:00.000        01:00:00   1.000000
        4  X         0 2014-12-15 04:00:00.000        01:00:00   1.000000
        5  Y         9 2014-12-15 00:00:00.000             NaT        NaN
        6  Y         1 2014-12-15 01:00:00.000        01:00:00   1.000000
        7  Y         3 2014-12-15 02:00:00.000        01:00:00   1.000000
        8  Y         2 2014-12-15 03:00:00.000        01:00:00   1.000000
        9  Y         7 2014-12-15 04:00:00.000        01:00:00   1.000000