我正在尝试使用以下代码获取每个组的最新实例。它做了我想要的,除了将Timestamp转换为numpy.datetime并将日期减去一天。 这似乎不是正确的行为。这是一个错误还是我错过了什么。
In [37]: df
Out[37]:
ticker currency date
0 AACE NaN NaT
1 AAP US Dollar 2012-12-29 00:00:00
2 AAP US Dollar 2013-04-20 00:00:00
3 AAP US Dollar 2013-07-13 00:00:00
4 ABBEY British Pound 2012-12-31 00:00:00
5 ABBEY British Pound 2013-03-30 00:00:00
6 ABBEY British Pound 2013-06-30 00:00:00
7 ABBNVX NaN NaT
8 ABBV US Dollar 2012-12-31 00:00:00
9 ABBV US Dollar 2013-03-31 00:00:00
10 ABBV US Dollar 2013-06-30 00:00:00
In [38]: df.date[3]
Out[38]: Timestamp('2013-07-13 00:00:00', tz=None)
In [39]: df.groupby('ticker').last()
Out[39]:
currency date ticker
AACE NaN NaN
AAP US Dollar 2013-07-12T17:00:00.000000000-0700
ABBEY British Pound 2013-06-29T17:00:00.000000000-0700
ABBNVX NaN NaN
ABBV US Dollar 2013-06-29T17:00:00.000000000-0700
In [40]: df.groupby('ticker').last().date[1]
Out[40]: numpy.datetime64('2013-07-12T17:00:00.000000000-0700')
In [41]:
编辑:
我没有原始示例,但这是另一个复制相同行为的示例。
In [57]: df
Out[57]:
ticker currency date
3227 WWW US Dollar 2013-03-23 00:00:00
3228 WWW US Dollar 2012-12-29 00:00:00
3229 WWW US Dollar 2013-06-15 00:00:00
3230 WWW US Dollar 2013-09-07 00:00:00
3231 WYLE NaN NaT
3232 YALUNI NaN NaT
3233 YKBNK NaN NaT
3234 YZCOAL NaN NaT
3235 ZACHRY NaN NaT
3236 ZAYOGR US Dollar 2013-03-31 00:00:00
3237 ZAYOGR US Dollar 2013-06-30 00:00:00
3238 ZAYOGR US Dollar 2012-12-31 00:00:00
3239 ZINC US Dollar 2013-06-30 00:00:00
3240 ZINC US Dollar 2012-12-31 00:00:00
3241 ZINC US Dollar 2013-03-31 00:00:00
In [58]: df.dtypes
Out[58]:
ticker object
currency object
date datetime64[ns]
dtype: object
In [59]: df.tail(7).groupby('ticker').last()
Out[59]:
currency date
ticker
ZACHRY NaN NaN
ZAYOGR US Dollar 2012-12-30T16:00:00.000000000-0800
ZINC US Dollar 2013-03-30T17:00:00.000000000-0700
In [60]: df.tail(6).groupby('ticker').last()
Out[60]:
currency date
ticker
ZAYOGR US Dollar 2012-12-31 00:00:00
ZINC US Dollar 2013-03-31 00:00:00
In [61]:
看起来只有当有NaT预设时,带有Timestamp的列才会搞砸。
答案 0 :(得分:0)
那些看似正确的时间,但它们是具有时区偏移的UTC时间戳(例如-0700
中的2013-07-12T17:00:00.00-0700
)。
见下文:
In [93]: x = np.datetime64('2013-07-12T17:00:00.000000000-0700')
In [94]: x
Out[94]: numpy.datetime64('2013-07-12T17:00:00.000000000-0700')
In [95]: pandas.Timestamp(x)
Out[95]: Timestamp('2013-07-13 00:00:00', tz=None)
为什么他们会这样转变:我不确定。可能是一个错误,但它应该足够简单,以apply
保持一切顺利。
答案 1 :(得分:0)
目前还不清楚你是如何构建你的例子的。请显示实际框架和dtypes。您可能没有使用和对象dtype(因为它附加了时区),因此无法正确解释。
In [10]: df = DataFrame(dict(
A = ['AACE','AAP','AAP','ABBEY','ABBEY'],
B = ['20121229','20130420','20130723','20121231','20130330']))
In [11]: df['B'] = pd.to_datetime(df['B'])
In [12]: df
Out[12]:
A B
0 AACE 2012-12-29 00:00:00
1 AAP 2013-04-20 00:00:00
2 AAP 2013-07-23 00:00:00
3 ABBEY 2012-12-31 00:00:00
4 ABBEY 2013-03-30 00:00:00
In [13]: df.groupby('A').last()
Out[13]:
B
A
AACE 2012-12-29 00:00:00
AAP 2013-07-23 00:00:00
ABBEY 2013-03-30 00:00:00
In [14]: df.groupby('A').last().dtypes
Out[14]:
B datetime64[ns]
dtype: object