计算熊猫中的日期字段

时间:2015-03-06 16:12:17

标签: pandas

我有一个如下所示的数据框:

from StringIO import StringIO

myst="""Uptime: 905034  Threads: 5  Questions: 1215  Slow queries: 3  Opens: 190  Flush tables: 1  Open tables: 4  Queries per second avg: 0.001
Uptime: 905094  Threads: 5  Questions: 1216  Slow queries: 3  Opens: 190  Flush tables: 1  Open tables: 4  Queries per second avg: 0.001
Uptime: 905154  Threads: 5  Questions: 1217  Slow queries: 3  Opens: 190  Flush tables: 1  Open tables: 4  Queries per second avg: 0.001
"""
u_cols=[]
for i in range(29):
    u_cols.append('column'+str(i))

myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep=' ', names = u_cols)

我尝试过:

df['IST_DATE']=df['column1'].apply((lambda x: dt.datetime.today() - dt.timedelta(seconds=60)))

In [127]: df[['column1','IST_DATE']]

Out[127]:
    column1     IST_DATE
0   905034  2015-03-06 15:55:55.993769
1   905094  2015-03-06 15:55:55.993791
2   905154  2015-03-06 15:55:55.993803

预期结果每行应有1分钟的差异。例如,

Out[127]:
    column1     IST_DATE
0   905034  2015-03-06 15:53:55.993769
1   905094  2015-03-06 15:54:55.993791
2   905154  2015-03-06 15:55:55.993803

正常运行时间每分钟计算一次。数据框中的最后一行显示截至目前为止所经过的秒数。因此对于例如905154表示服务器于2月24日启动

>>> dt.datetime.today() - dt.timedelta(seconds=905154)
datetime.datetime(2015, 2, 24, 4, 40, 16, 28786)

换句话说,表示从开始时间(本例中为2月24日)的秒数的'column1'应转换为可读日期。


更新

如何找到column1的最后一个值? 我需要像这样使用该值(例如905154)....

df['IST_DATE']=df['column1'].apply((lambda x: dt.datetime.today() - pd.Timedelta(905154,unit='s') + pd.Timedelta(x,unit='s')))

df[['column1','IST_DATE']]

更新1

我尝试了类似的东西,但它不起作用:

myval=df.tail(1)['column1']

df['IST_DATE']=df['column1'].apply((lambda x: dt.datetime.today() - pd.Timedelta(str(myval),unit='s') + pd.Timedelta(x,unit='s')))

1 个答案:

答案 0 :(得分:0)

头部和尾部不是索引pandas数据的正确方法。

import pandas as pd
import datetime as dt

u_cols=[]
for i in range(29):
    u_cols.append('column'+str(i))

import pandas as pd
df = pd.read_csv('/root/status_success.txt', sep=' ', names = u_cols)

myval=df['column1'].iloc[-1]
df['IST_DATE']=df['column1'].apply((lambda x: dt.datetime.today() - pd.Timedelta(myval,unit='s') + pd.Timedelta(x,unit='s')))