pandas datetime切片:junkdf.ix ['2015-08-03':'2015-08-06']无效

时间:2016-08-23 18:51:22

标签: datetime pandas indexing selection slice

junkdf:

            rev
dtime   
2015-08-03  20.45
2015-08-04  -2.57
2015-08-05  12.53
2015-08-06  -8.16
2015-08-07  -4.41

junkdf.reset_index()。to_dict( 'REC')

[{'dtime': datetime.date(2015, 8, 3), 'rev': 20.45},
 {'dtime': datetime.date(2015, 8, 4), 'rev': -2.5699999999999994},
 {'dtime': datetime.date(2015, 8, 5), 'rev': 12.53},
 {'dtime': datetime.date(2015, 8, 6), 'rev': -8.16},
 {'dtime': datetime.date(2015, 8, 7), 'rev': -4.41}]

junkdf.set_index('dtime',inplace=True)

为什么我不能像下面描述的那样进行任何日期时间切片:

python-pandas-dataframe-slicing-by-date-conditions

time series datetime slicing

junkdf [ '2015年8月3日':]

C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
   1112     def searchsorted(self, key, side='left', sorter=None):
   1113         # needs coercion on the key (DatetimeIndex does already)
-> 1114         return self.values.searchsorted(key, side=side, sorter=sorter)
   1115 
   1116     _shared_docs['drop_duplicates'] = (

TypeError: unorderable types: datetime.date() > str()

junkdf.ix [ '2015年8月3日': '2015年8月6日']

C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
   1112     def searchsorted(self, key, side='left', sorter=None):
   1113         # needs coercion on the key (DatetimeIndex does already)
-> 1114         return self.values.searchsorted(key, side=side, sorter=sorter)
   1115 
   1116     _shared_docs['drop_duplicates'] = (

TypeError: unorderable types: datetime.date() > str()

start = junkdf.index.searchsorted(dt.datetime(2015,8,4))

C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
   1112     def searchsorted(self, key, side='left', sorter=None):
   1113         # needs coercion on the key (DatetimeIndex does already)
-> 1114         return self.values.searchsorted(key, side=side, sorter=sorter)
   1115 
   1116     _shared_docs['drop_duplicates'] = (

TypeError: can't compare datetime.datetime to datetime.date))

但是,如果我使用dt.date():

,则以下情况有效
start = junkdf.index.searchsorted(dt.date(2015, 8, 4))
end = junkdf.index.searchsorted(dt.date(2015, 8, 6))

junkdf.ix[start:end]

                rev
    dtime   
    2015-08-04  -2.57
    2015-08-05  12.53

更新

junkdf = df[['dtime','rev']].groupby((df.dtime).dt.date).sum().copy()

df[['dtime','rev']]的样子:

dtime   rev
0   2015-08-03 07:59:59 -0.18
1   2015-08-03 08:59:59 -0.11
2   2015-08-03 09:59:59 -0.29
3   2015-08-03 10:59:59 -0.08
4   2015-08-03 11:59:59 0.69

UPDATE2:

我试过了:

df[['dtime','rev']].head()

dtime   rev
0   2015-08-03 07:59:59 -0.18
1   2015-08-03 08:59:59 -0.11
2   2015-08-03 09:59:59 -0.29
3   2015-08-03 10:59:59 -0.08
4   2015-08-03 11:59:59 0.69

df[['dtime','rev']].groupby(pd.TimeGrouper('D', key=df.dtime)).sum()

C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self)
    804     def __hash__(self):
    805         raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 806                         ' hashed'.format(self.__class__.__name__))
    807 
    808     def __iter__(self):

TypeError: 'Series' objects are mutable, thus they cannot be hashed

1 个答案:

答案 0 :(得分:2)

假设您有以下来源DF(我从您之前的问题中获取并进行了更改,因此我们有多天的数据):

In [85]: df
Out[85]:
              datetime  hour   rev
0  2016-05-01 01:00:00     1 -0.02
1  2016-05-01 02:00:00     2 -0.01
2  2016-05-01 03:00:00     3 -0.02
3  2016-05-01 04:00:00     4 -0.02
4  2016-05-01 05:00:00     5 -0.01
5  2016-05-02 06:00:00     6 -0.03
6  2016-05-02 07:00:00     7 -0.10
7  2016-05-02 08:00:00     8 -0.09
8  2016-05-03 09:00:00     9 -0.08
9  2016-05-03 10:00:00    10 -0.10
10 2016-05-03 11:00:00    11 -0.12
11 2016-05-04 12:00:00    12 -0.14
12 2016-05-04 13:00:00    13 -0.17
13 2016-05-04 14:00:00    14 -0.16
14 2016-05-05 15:00:00    15 -0.15
15 2016-05-05 16:00:00    16 -0.15
16 2016-05-05 17:00:00    17 -0.17
17 2016-05-06 18:00:00    18 -0.16
18 2016-05-06 19:00:00    19 -0.18
19 2016-05-06 20:00:00    20 -0.17
20 2016-05-07 21:00:00    21 -0.14
21 2016-05-07 22:00:00    22 -0.16
22 2016-05-08 23:00:00    23 -0.08
23 2016-05-08 00:00:00    24 -0.06

让我们按天分组并计算sum

In [89]: rslt = (df.assign(t=df.datetime - pd.Timedelta(hours=1))
   ....:           .groupby(pd.TimeGrouper('D', key='t'))['rev']
   ....:           .sum())

In [90]: rslt
Out[90]:
t
2016-05-01   -0.08
2016-05-02   -0.22
2016-05-03   -0.30
2016-05-04   -0.47
2016-05-05   -0.47
2016-05-06   -0.51
2016-05-07   -0.36
2016-05-08   -0.08
Freq: D, Name: rev, dtype: float64

In [92]: rslt.index.dtype
Out[92]: dtype('<M8[ns]')

现在切片应该正常工作(因为索引有datetime dtype):

In [91]: rslt.ix['2016-05-03':'2016-05-06']
Out[91]:
t
2016-05-03   -0.30
2016-05-04   -0.47
2016-05-05   -0.47
2016-05-06   -0.51
Freq: D, Name: rev, dtype: float64