我在pandas
DataFrame中有一列时间戳(以毫秒为单位)。从时间戳开始,我尝试在单独的列中导出时间戳的小时,分钟,星期和月份。
我尝试在整个专栏中使用apply
功能,但无济于事。所以,我采用了一种非常天真(但不是非常简洁)的方法来创建这些列:
import pandas
import datetime
df=pd.DataFrame( {'time':[1401811621559, 1402673694105, 1402673749561, 1401811615479, 1402673708254], 'person':['Harry', 'Ann', 'Sue', 'Jeremy', 'Anne']})
df['time'] = pandas.to_datetime(df.time, unit='ms')
days = []
tod = []
month = []
minutes = []
for row in df['time']:
days.append(row.strftime('%w'))
tod.append(row.strftime('%H'))
month.append(row.strftime('%m'))
minutes.append(row.strftime('%M'))
##
df['dayOfWeek'] = days
df['timeOfDay'] = tod
df['month'] = month
df['minutes'] = minutes
有没有办法做到这一点呢?
df['dayOfWeek'] = df['time'].apply(strftime('%w'),axis = 1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'strftime' is not defined
答案 0 :(得分:2)
目前你必须在DatetimeIndex中包装该列:
In [11]: dti = pd.DatetimeIndex(df['time'])
In [12]: dti.dayofweek
Out[12]: array([1, 4, 4, 1, 4])
In [13]: dti.time
Out[13]:
array([datetime.time(16, 7, 1, 559000), datetime.time(15, 34, 54, 105000),
datetime.time(15, 35, 49, 561000), datetime.time(16, 6, 55, 479000),
datetime.time(15, 35, 8, 254000)], dtype=object)
In [14]: dti.month
Out[14]: array([6, 6, 6, 6, 6])
In [15]: dti.minute
Out[15]: array([ 7, 34, 35, 6, 35])
等
请参阅this issue,了解如何直接从日期时间序列中获取这些方法。
答案 1 :(得分:1)
你也可以把它变成一个lambda函数:
df['dayOfWeek2'] = df.time.apply(lambda x:x.strftime('%w'))
现在输入
df.dayOfWeek2 == df.dayOfWeek
产量
0 True
1 True
2 True
3 True
4 True
dtype: bool
答案 2 :(得分:0)
是的,稍微修改你的代码......
def timeGroups(row):
row['days'] = row['time'].strftime('%w'))
#do the same thing for month,seconds,etc.
return row
df['dayOfWeek'] = df['time'].apply(timeGroups,axis = 1)