从unix时间戳创建DatetimeIndex并添加本地时区的性能

时间:2014-09-22 16:46:24

标签: python datetime pandas

pandas版本0.14.1

我执行以下操作:

import numpy as np
import dateutil
from pandas import DataFrame, DatetimeIndex
import time

cur_size = 1000000
columns = ['A', 'B', 'C', 'D', 'E', 'F']
mdf = np.empty(shape=(cur_size, len(columns)), dtype=np.float32)
idf = np.empty(cur_size,dtype=np.int64)

idf = xrange(1213424324300000000,1213424324300000000+cur_size*1000000, 1000000)
# fill in mdf,idf

index = DatetimeIndex(idf).tz_localize('UTC').tz_convert(dateutil.tz.tzlocal())
frame = DataFrame(mdf, columns = columns, index = index)

所有这一切都很快,直到我尝试向帧添加新列,例如:

start = time.time()
frame['dfd'] = 0
print 'took', time.time()-start

这需要永远(花费10.59秒),但只是第一次,之后添加更多列再次快速。 Profiler显示大熊猫做了一些非常奇怪的事情,比如,时区转换没有发生:

   4275752 function calls (4275746 primitive calls) in 6.461 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.503    6.503 string:2(<module>)
        1    0.000    0.000    6.503    6.503 frame.py:1994(__setitem__)
        1    0.000    0.000    6.499    6.499 indexing.py:1520(_convert_to_index_sliceable)
        1    0.000    0.000    6.499    6.499 index.py:1299(_get_string_slice)
     10/4    0.000    0.000    6.499    1.625 {getattr}
        1    0.001    0.001    6.499    6.499 index.py:1414(inferred_freq)
        1    0.000    0.000    6.498    6.498 frequencies.py:626(infer_freq)
        1    0.000    0.000    6.490    6.490 frequencies.py:694(__init__)
        1    0.000    0.000    6.489    6.489 frequencies.py:669(_tz_convert_with_transitions)
        1    0.006    0.006    6.489    6.489 function_base.py:1660(__call__)
        1    0.234    0.234    6.483    6.483 function_base.py:1746(_vectorize_call)
   534416    0.220    0.000    6.217    0.000 frequencies.py:676(<lambda>)
   534416    3.741    0.000    5.997    0.000 {pandas.tslib.tz_convert_single}
   534417    0.295    0.000    1.863    0.000 tz.py:107(utcoffset)
   534417    0.792    0.000    1.568    0.000 tz.py:123(_isdst)
   534417    0.701    0.000    0.701    0.000 {time.localtime}
   534417    0.232    0.000    0.393    0.000 tz.py:154(__eq__)
   534470    0.161    0.000    0.161    0.000 {isinstance}
   534417    0.074    0.000    0.074    0.000 {method 'toordinal' of 'datetime.date' objects}
       20    0.032    0.002    0.032    0.002 {numpy.core.multiarray.array}
        1    0.000    0.000    0.009    0.009 frequencies.py:716(get_freq)
        1    0.000    0.000    0.009    0.009 frequencies.py:708(deltas)

1 个答案:

答案 0 :(得分:1)

这是在master / 0.15.0(2014年10月初发布)中修复的。这是我记得的最接近的问题:https://github.com/pydata/pandas/pull/7798

他们有很多与DST转换检查相关的修补程序(这是问题的根源),请参阅0.15.0 here的新内容。