从熊猫时间戳获取MM-DD-YYYY

时间:2013-10-01 00:16:52

标签: python date pandas

在python中,日期似乎是一个棘手的事情,而我只是在大熊猫TimeStamp中删除日期时遇到了很多麻烦。我想从2013-09-29 02:34:4409-29-2013

我有一个包含Created_date列的数据框:

Name: Created_Date, Length: 1162549, dtype: datetime64[ns]`

我尝试在此系列中应用.date()方法,例如:df.Created_Date.date(),但我收到错误AttributeError: 'Series' object has no attribute 'date'

有人能帮助我吗?

4 个答案:

答案 0 :(得分:32)

map元素:

In [239]: from operator import methodcaller

In [240]: s = Series(date_range(Timestamp('now'), periods=2))

In [241]: s
Out[241]:
0   2013-10-01 00:24:16
1   2013-10-02 00:24:16
dtype: datetime64[ns]

In [238]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[238]:
0    01-10-2013
1    02-10-2013
dtype: object

In [242]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[242]:
0    01-10-2013
1    02-10-2013
dtype: object

您可以通过调用构成datetime.date的{​​{1}}元素的date()方法来获取原始Timestamp对象:

Series

然而另一种方式可以通过调用未绑定的In [249]: s.map(methodcaller('date')) Out[249]: 0 2013-10-01 1 2013-10-02 dtype: object In [250]: s.map(methodcaller('date')).values Out[250]: array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object) 方法来实现:

Timestamp.date

这种方法是最快的,而恕我直言最具可读性。可以在顶级In [273]: s.map(Timestamp.date) Out[273]: 0 2013-10-01 1 2013-10-02 dtype: object 模块中访问Timestamp,如下所示:pandas。我已将其直接导入以用于说明目的。

pandas.Timestamp个对象的date属性执行类似操作,但返回DatetimeIndex对象数组:

numpy

对于较大的In [243]: index = DatetimeIndex(s) In [244]: index Out[244]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-10-01 00:24:16, 2013-10-02 00:24:16] Length: 2, Freq: None, Timezone: None In [246]: index.date Out[246]: array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object) datetime64[ns]个对象,调用Series的速度比Timestamp.date更快,这比operator.methodcaller略快:

lambda

请注意,In [263]: f = methodcaller('date') In [264]: flam = lambda x: x.date() In [265]: fmeth = Timestamp.date In [266]: s2 = Series(date_range('20010101', periods=1000000, freq='T')) In [267]: s2 Out[267]: 0 2001-01-01 00:00:00 1 2001-01-01 00:01:00 2 2001-01-01 00:02:00 3 2001-01-01 00:03:00 4 2001-01-01 00:04:00 5 2001-01-01 00:05:00 6 2001-01-01 00:06:00 7 2001-01-01 00:07:00 8 2001-01-01 00:08:00 9 2001-01-01 00:09:00 10 2001-01-01 00:10:00 11 2001-01-01 00:11:00 12 2001-01-01 00:12:00 13 2001-01-01 00:13:00 14 2001-01-01 00:14:00 ... 999985 2002-11-26 10:25:00 999986 2002-11-26 10:26:00 999987 2002-11-26 10:27:00 999988 2002-11-26 10:28:00 999989 2002-11-26 10:29:00 999990 2002-11-26 10:30:00 999991 2002-11-26 10:31:00 999992 2002-11-26 10:32:00 999993 2002-11-26 10:33:00 999994 2002-11-26 10:34:00 999995 2002-11-26 10:35:00 999996 2002-11-26 10:36:00 999997 2002-11-26 10:37:00 999998 2002-11-26 10:38:00 999999 2002-11-26 10:39:00 Length: 1000000, dtype: datetime64[ns] In [269]: timeit s2.map(f) 1 loops, best of 3: 1.04 s per loop In [270]: timeit s2.map(flam) 1 loops, best of 3: 1.1 s per loop In [271]: timeit s2.map(fmeth) 1 loops, best of 3: 968 ms per loop 的目标之一是在pandas之上提供一个图层,以便(大多数情况下)您不必处理低级细节numpy的。{因此,在数组中获取原始ndarray对象的用途有限,因为它们与datetime.date支持的任何numpy.dtype不对应(pandas仅支持{{1}那是[纳秒] dtypes)。也就是说,有时你需要这样做。

答案 1 :(得分:3)

也许这只是最近出现的,但有内置的方法。尝试:

In [27]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=2))
In [28]: s
Out[28]: 
0   2016-02-11 19:11:43.386016
1   2016-02-12 19:11:43.386016
dtype: datetime64[ns]
In [29]: s.dt.to_pydatetime()
Out[29]: 
array([datetime.datetime(2016, 2, 11, 19, 11, 43, 386016),
   datetime.datetime(2016, 2, 12, 19, 11, 43, 386016)], dtype=object)

答案 2 :(得分:2)

您可以尝试使用.dt.date datetime64[ns]上的dataframe

例如df['Created_date'] = df['Created_date'].dt.date

输入dataframe,名为test_df

print(test_df)

结果:

         Created_date
0     2015-03-04 15:39:16
1     2015-03-22 17:36:49
2     2015-03-25 22:08:45
3     2015-03-16 13:45:20
4     2015-03-19 18:53:50

检查dtypes

print(test_df.dtypes)

结果:

Created_date    datetime64[ns]
dtype: object

提取date并更新Created_date列:

test_df['Created_date'] = test_df['Created_date'].dt.date
print(test_df)

结果:

  Created_date
0   2015-03-04
1   2015-03-22
2   2015-03-25
3   2015-03-16
4   2015-03-19

答案 3 :(得分:0)

好吧,我会这样做。

pdTime =pd.date_range(timeStamp, periods=len(years), freq="D")
pdTime[i].strftime('%m-%d-%Y')