奇怪的行为索引Pandas数据帧的日期

时间:2015-02-02 17:35:59

标签: python pandas

我刚刚完成了Pandas教程,并且对以下行为感到有些困惑。

In [28]: d
Out[28]: 
            Status  CustomerCount
StatusDate                       
2009-01-05       9           2519
2009-01-12      10           3351
2009-01-19      10           2188
2009-01-26      10           2301
2009-02-02       7           2204
2009-02-09       9           1538
2009-02-16       9           1983
2009-02-23       9           1960
2009-03-02      11           2887
2009-03-09       9           2927

通过字符串获取特定月份的记录效果很好:

In [31]: d['2009-02']
Out[31]: 
            Status  CustomerCount
StatusDate                       
2009-02-02       7           2204
2009-02-09       9           1538
2009-02-16       9           1983
2009-02-23       9           1960

切片日期范围也很有效:

In [33]: d['2009-02-09':'2009-02-10']
Out[33]: 
            Status  CustomerCount
StatusDate                       
2009-02-09       9           1538

使用相同的方法获取特定日期的记录不会:

In [32]: d['2009-02-09']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-32-b78c7ec0d497> in <module>()
----> 1 d['2009-02-09']

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
   1676             return self._getitem_multilevel(key)
   1677         else:
-> 1678             return self._getitem_column(key)
   1679 
   1680     def _getitem_column(self, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _getitem_column(self, key)
   1683         # get column
   1684         if self.columns.is_unique:
-> 1685             return self._get_item_cache(key)
   1686 
   1687         # duplicate columns & possible reduce dimensionaility

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in _get_item_cache(self, item)
   1050         res = cache.get(item)
   1051         if res is None:
-> 1052             values = self._data.get(item)
   1053             res = self._box_item_values(item, values)
   1054             cache[item] = res

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in get(self, item, fastpath)
   2563 
   2564             if not isnull(item):
-> 2565                 loc = self.items.get_loc(item)
   2566             else:
   2567                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_loc(self, key)
   1179         loc : int if unique index, possibly slice or mask if not
   1180         """
-> 1181         return self._engine.get_loc(_values_from_object(key))
   1182 
   1183     def get_value(self, series, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3572)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3452)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11343)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11296)()

KeyError: '2009-02-09'

以下都没有:

In [36]: d[d.first_valid_index()]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-36-071dd1d3c77c> in <module>()
----> 1 d[d.first_valid_index()]

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
   1676             return self._getitem_multilevel(key)
   1677         else:
-> 1678             return self._getitem_column(key)
   1679 
   1680     def _getitem_column(self, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _getitem_column(self, key)
   1683         # get column
   1684         if self.columns.is_unique:
-> 1685             return self._get_item_cache(key)
   1686 
   1687         # duplicate columns & possible reduce dimensionaility

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in _get_item_cache(self, item)
   1050         res = cache.get(item)
   1051         if res is None:
-> 1052             values = self._data.get(item)
   1053             res = self._box_item_values(item, values)
   1054             cache[item] = res

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in get(self, item, fastpath)
   2563 
   2564             if not isnull(item):
-> 2565                 loc = self.items.get_loc(item)
   2566             else:
   2567                 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_loc(self, key)
   1179         loc : int if unique index, possibly slice or mask if not
   1180         """
-> 1181         return self._engine.get_loc(_values_from_object(key))
   1182 
   1183     def get_value(self, series, key):

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3572)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3452)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11343)()

/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11296)()

KeyError: Timestamp('2009-01-05 00:00:00')

但这样做:

In [37]: d.loc[d.first_valid_index()]
Out[37]: 
Status              9
CustomerCount    2519
Name: 2009-01-05 00:00:00, dtype: int64

这种行为有问题还是我误解了什么?

1 个答案:

答案 0 :(得分:0)

d是一个DataFrame,因此使用df[key]时的主要索引器是为列编制索引(请参阅文档中的indexing basics)。
仅当key是切片时才会出现异常。为方便起见,在DataFrame上切片会切片

在您的示例中,d['2009-02-09':'2009-02-10']是一个切片,因此正确切片行。在d['2009-02-09']中,您只给出一个键,因此它会查看列,为此您会得到一个KeyError,因为'2009-02-09'不是列名。

d['2009-02']是一个特例,一开始可能有点混乱。它是一个单独的字符串,但实际上代表一个切片(此功能称为部分字符串索引,请参阅文档here)。