我使用pandas to_datetime来格式化DataFrame的时间戳,如下所示:
import pandas as pd
from pandas import *
d = {'TIMESTAMP' : Series([1294311545, 1294317813, 1294318449]),
'PRICE' : Series([24990, 25499, 25499]),
'VOLUME' : Series([1500000000, 5000000000, 100000000])}
df = DataFrame(d)
print df
df.TIMESTAMP = pd.to_datetime(df.TIMESTAMP, unit='s')
df.set_index('TIMESTAMP', inplace=True)
print df
test = df['VOLUME'].resample('H', how='sum')
print test
test2= df['PRICE'].resample('H', how='ohlc')
print test2
输出1是:
PRICE TIMESTAMP VOLUME
0 24990 1294311545 1500000000
1 25499 1294317813 5000000000
2 25499 1294318449 100000000
PRICE VOLUME
TIMESTAMP
2011-01-06 10:59:05 24990 1500000000
2011-01-06 12:43:33 25499 5000000000
2011-01-06 12:54:09 25499 100000000
我第二次使用python datetime打印出上面DataFrame中的时间戳:
import datetime
print(datetime.datetime.fromtimestamp(int("1294311545")).strftime('%Y-%m-%d %H:%M:%S'))
print(datetime.datetime.fromtimestamp(int("1294317813")).strftime('%Y-%m-%d %H:%M:%S'))
print(datetime.datetime.fromtimestamp(int("1294318449")).strftime('%Y-%m-%d %H:%M:%S'))
输出2是:
2011-01-06 02:59:05
2011-01-06 04:43:33
2011-01-06 04:54:09
你会发现output1和ouput2是不同的!这是时区问题吗?我需要ouput1应该与output2相同。以及如何解决它?
答案 0 :(得分:1)
fromtimestamp
会将时间本地化,您可以使用utcfromtimestamp
来获取pandas所做的事情(这里没有本地化)
In [22]: df.index[0]
Out[22]: Timestamp('2011-01-06 10:59:05', tz=None)
In [24]: datetime.datetime.utcfromtimestamp(int("1294311545")).strftime('%Y-%m-%d %H:%M:%S')
Out[24]: '2011-01-06 10:59:05'
In [25]: datetime.datetime.fromtimestamp(int("1294311545")).strftime('%Y-%m-%d %H:%M:%S')
Out[25]: '2011-01-06 05:59:05'
如果你想本地化tz,你可以这样做
In [59]: df.index = df.index.tz_localize('Asia/Shanghai').tz_convert('UTC')
In [60]: df
Out[60]:
PRICE VOLUME
TIMESTAMP
2011-01-06 02:59:05+00:00 24990 1500000000
2011-01-06 04:43:33+00:00 25499 5000000000
2011-01-06 04:54:09+00:00 25499 100000000