python - pandas:如何按日期选择

时间:2017-05-19 07:57:18

标签: python pandas

为什么我可以在这种情况下按月进行选择,但不是按日期选择?

dates = pd.date_range( start = "01/01/1931" ,  end  =  "01/02/1941" )
new_df_4 = new_df_3.reindex(dates)
new_df_4["1931-10"][![enter image description here][1]][1]

enter image description here

但这不起作用:

new_df_4["1931-10-02"]

KeyError Traceback(最近一次调用最后一次)  in() ----> 1 new_df_4 [“1931-10-02”]

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1990             return self._getitem_multilevel(key)
   1991         else:
-> 1992             return self._getitem_column(key)
   1993 
   1994     def _getitem_column(self, key):

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2002         result = self._constructor(self._data.get(key))
   2003         if result.columns.is_unique:
-> 2004             result = result[key]
   2005 
   2006         return result

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1990             return self._getitem_multilevel(key)
   1991         else:
-> 1992             return self._getitem_column(key)
   1993 
   1994     def _getitem_column(self, key):

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1997         # get column
   1998         if self.columns.is_unique:
-> 1999             return self._get_item_cache(key)
   2000 
   2001         # duplicate columns & possible reduce dimensionality

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1343         res = cache.get(item)
   1344         if res is None:
-> 1345             values = self._data.get(item)
   1346             res = self._box_item_values(item, values)
   1347             cache[item] = res

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3223 
   3224             if not isnull(item):
-> 3225                 loc = self.items.get_loc(item)
   3226             else:
   3227                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1876                 return self._engine.get_loc(key)
   1877             except KeyError:
-> 1878                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1879 
   1880         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4027)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)()

KeyError: '1931-10-02'

1 个答案:

答案 0 :(得分:4)

按月选择使用partial string indexing

print (new_df_4["1931-10"])

如果分辨率相同(来自same docs),这不会起作用:

  

警告   但是,如果将字符串视为完全匹配,则   DataFrame的[]中的选择将按列进行,而不是按行进行,请参阅   索引基础知识。例如,dft_minute [' 2011-12-31 23:59']将会提升   KeyError为' 2012-12-31 23:59'具有与索引相同的分辨率   没有这样名称的列:总是有明确的   选择,行是被视为切片还是单个   选择,使用.loc。

In [95]: dft_minute.loc['2011-12-31 23:59']
Out[95]: 
a    1
b    4
Name: 2011-12-31 23:59:00, dtype: int64

如果需要按日期选择,您可以使用loc

new_df_4.loc["1931-10-02"]

样品:

np.random.seed(10)
dates = pd.date_range( start = "01/01/1931" ,  end  =  "01/02/1941" )
new_df_4  = pd.DataFrame({'a':np.random.randint(10, size=len(dates))}, index=dates)
print (new_df_4.head())
            a
1931-01-01  9
1931-01-02  4
1931-01-03  0
1931-01-04  1
1931-01-05  9

print (new_df_4["1931-10"])
            a
1931-10-01  9
1931-10-02  6
1931-10-03  9
1931-10-04  7
1931-10-05  8
1931-10-06  0
1931-10-07  9
1931-10-08  6
1931-10-09  0
1931-10-10  1
1931-10-11  0
...

print (new_df_4.loc["1931-10-02"])
a    6
Name: 1931-10-02 00:00:00, dtype: int32