从Pandas datetime列中获取新数据

时间:2014-06-13 16:22:22

标签: python datetime pandas

我在pandas DataFrame中有一列时间戳(以毫秒为单位)。从时间戳开始,我尝试在单独的列中导出时间戳的小时,分​​钟,星期和月份。

我尝试在整个专栏中使用apply功能,但无济于事。所以,我采用了一种非常天真(但不是非常简洁)的方法来创建这些列:

import pandas
import datetime

df=pd.DataFrame( {'time':[1401811621559, 1402673694105, 1402673749561, 1401811615479, 1402673708254], 'person':['Harry', 'Ann', 'Sue', 'Jeremy', 'Anne']})

df['time'] = pandas.to_datetime(df.time, unit='ms')
days = []
tod = []
month = []
minutes = []
for row in df['time']:
    days.append(row.strftime('%w'))
    tod.append(row.strftime('%H'))
    month.append(row.strftime('%m'))
    minutes.append(row.strftime('%M'))
##
df['dayOfWeek'] = days
df['timeOfDay'] = tod
df['month'] = month
df['minutes'] = minutes

有没有办法做到这一点呢?

df['dayOfWeek'] = df['time'].apply(strftime('%w'),axis = 1)

    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'strftime' is not defined

3 个答案:

答案 0 :(得分:2)

目前你必须在DatetimeIndex中包装该列:

In [11]: dti = pd.DatetimeIndex(df['time'])

In [12]: dti.dayofweek
Out[12]: array([1, 4, 4, 1, 4])

In [13]: dti.time
Out[13]:
array([datetime.time(16, 7, 1, 559000), datetime.time(15, 34, 54, 105000),
       datetime.time(15, 35, 49, 561000), datetime.time(16, 6, 55, 479000),
       datetime.time(15, 35, 8, 254000)], dtype=object)

In [14]: dti.month
Out[14]: array([6, 6, 6, 6, 6])

In [15]: dti.minute
Out[15]: array([ 7, 34, 35,  6, 35])

请参阅this issue,了解如何直接从日期时间序列中获取这些方法。

答案 1 :(得分:1)

你也可以把它变成一个lambda函数:

df['dayOfWeek2'] = df.time.apply(lambda x:x.strftime('%w'))

现在输入

df.dayOfWeek2 == df.dayOfWeek

产量

0    True
1    True
2    True
3    True
4    True
dtype: bool

答案 2 :(得分:0)

是的,稍微修改你的代码......

def timeGroups(row):
    row['days'] = row['time'].strftime('%w'))
    #do the same thing for month,seconds,etc.
    return row
df['dayOfWeek'] = df['time'].apply(timeGroups,axis = 1)