我刚刚完成了Pandas教程,并且对以下行为感到有些困惑。
In [28]: d
Out[28]:
Status CustomerCount
StatusDate
2009-01-05 9 2519
2009-01-12 10 3351
2009-01-19 10 2188
2009-01-26 10 2301
2009-02-02 7 2204
2009-02-09 9 1538
2009-02-16 9 1983
2009-02-23 9 1960
2009-03-02 11 2887
2009-03-09 9 2927
通过字符串获取特定月份的记录效果很好:
In [31]: d['2009-02']
Out[31]:
Status CustomerCount
StatusDate
2009-02-02 7 2204
2009-02-09 9 1538
2009-02-16 9 1983
2009-02-23 9 1960
切片日期范围也很有效:
In [33]: d['2009-02-09':'2009-02-10']
Out[33]:
Status CustomerCount
StatusDate
2009-02-09 9 1538
使用相同的方法获取特定日期的记录不会:
In [32]: d['2009-02-09']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-32-b78c7ec0d497> in <module>()
----> 1 d['2009-02-09']
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
1676 return self._getitem_multilevel(key)
1677 else:
-> 1678 return self._getitem_column(key)
1679
1680 def _getitem_column(self, key):
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _getitem_column(self, key)
1683 # get column
1684 if self.columns.is_unique:
-> 1685 return self._get_item_cache(key)
1686
1687 # duplicate columns & possible reduce dimensionaility
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in _get_item_cache(self, item)
1050 res = cache.get(item)
1051 if res is None:
-> 1052 values = self._data.get(item)
1053 res = self._box_item_values(item, values)
1054 cache[item] = res
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in get(self, item, fastpath)
2563
2564 if not isnull(item):
-> 2565 loc = self.items.get_loc(item)
2566 else:
2567 indexer = np.arange(len(self.items))[isnull(self.items)]
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_loc(self, key)
1179 loc : int if unique index, possibly slice or mask if not
1180 """
-> 1181 return self._engine.get_loc(_values_from_object(key))
1182
1183 def get_value(self, series, key):
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3572)()
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3452)()
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11343)()
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11296)()
KeyError: '2009-02-09'
以下都没有:
In [36]: d[d.first_valid_index()]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-36-071dd1d3c77c> in <module>()
----> 1 d[d.first_valid_index()]
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
1676 return self._getitem_multilevel(key)
1677 else:
-> 1678 return self._getitem_column(key)
1679
1680 def _getitem_column(self, key):
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in _getitem_column(self, key)
1683 # get column
1684 if self.columns.is_unique:
-> 1685 return self._get_item_cache(key)
1686
1687 # duplicate columns & possible reduce dimensionaility
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in _get_item_cache(self, item)
1050 res = cache.get(item)
1051 if res is None:
-> 1052 values = self._data.get(item)
1053 res = self._box_item_values(item, values)
1054 cache[item] = res
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in get(self, item, fastpath)
2563
2564 if not isnull(item):
-> 2565 loc = self.items.get_loc(item)
2566 else:
2567 indexer = np.arange(len(self.items))[isnull(self.items)]
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/core/index.pyc in get_loc(self, key)
1179 loc : int if unique index, possibly slice or mask if not
1180 """
-> 1181 return self._engine.get_loc(_values_from_object(key))
1182
1183 def get_value(self, series, key):
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3572)()
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3452)()
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11343)()
/usr/local/lib/python2.7/site-packages/pandas-0.14.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:11296)()
KeyError: Timestamp('2009-01-05 00:00:00')
但这样做:
In [37]: d.loc[d.first_valid_index()]
Out[37]:
Status 9
CustomerCount 2519
Name: 2009-01-05 00:00:00, dtype: int64
这种行为有问题还是我误解了什么?
答案 0 :(得分:0)
d
是一个DataFrame,因此使用df[key]
时的主要索引器是为列编制索引(请参阅文档中的indexing basics)。
仅当key
是切片时才会出现异常。为方便起见,在DataFrame上切片会切片行。
在您的示例中,d['2009-02-09':'2009-02-10']
是一个切片,因此正确切片行。在d['2009-02-09']
中,您只给出一个键,因此它会查看列,为此您会得到一个KeyError,因为'2009-02-09'不是列名。
d['2009-02']
是一个特例,一开始可能有点混乱。它是一个单独的字符串,但实际上代表一个切片(此功能称为部分字符串索引,请参阅文档here)。