pandas版本0.14.1
我执行以下操作:
import numpy as np
import dateutil
from pandas import DataFrame, DatetimeIndex
import time
cur_size = 1000000
columns = ['A', 'B', 'C', 'D', 'E', 'F']
mdf = np.empty(shape=(cur_size, len(columns)), dtype=np.float32)
idf = np.empty(cur_size,dtype=np.int64)
idf = xrange(1213424324300000000,1213424324300000000+cur_size*1000000, 1000000)
# fill in mdf,idf
index = DatetimeIndex(idf).tz_localize('UTC').tz_convert(dateutil.tz.tzlocal())
frame = DataFrame(mdf, columns = columns, index = index)
所有这一切都很快,直到我尝试向帧添加新列,例如:
start = time.time()
frame['dfd'] = 0
print 'took', time.time()-start
这需要永远(花费10.59秒),但只是第一次,之后添加更多列再次快速。 Profiler显示大熊猫做了一些非常奇怪的事情,比如,时区转换没有发生:
4275752 function calls (4275746 primitive calls) in 6.461 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 6.503 6.503 string:2(<module>)
1 0.000 0.000 6.503 6.503 frame.py:1994(__setitem__)
1 0.000 0.000 6.499 6.499 indexing.py:1520(_convert_to_index_sliceable)
1 0.000 0.000 6.499 6.499 index.py:1299(_get_string_slice)
10/4 0.000 0.000 6.499 1.625 {getattr}
1 0.001 0.001 6.499 6.499 index.py:1414(inferred_freq)
1 0.000 0.000 6.498 6.498 frequencies.py:626(infer_freq)
1 0.000 0.000 6.490 6.490 frequencies.py:694(__init__)
1 0.000 0.000 6.489 6.489 frequencies.py:669(_tz_convert_with_transitions)
1 0.006 0.006 6.489 6.489 function_base.py:1660(__call__)
1 0.234 0.234 6.483 6.483 function_base.py:1746(_vectorize_call)
534416 0.220 0.000 6.217 0.000 frequencies.py:676(<lambda>)
534416 3.741 0.000 5.997 0.000 {pandas.tslib.tz_convert_single}
534417 0.295 0.000 1.863 0.000 tz.py:107(utcoffset)
534417 0.792 0.000 1.568 0.000 tz.py:123(_isdst)
534417 0.701 0.000 0.701 0.000 {time.localtime}
534417 0.232 0.000 0.393 0.000 tz.py:154(__eq__)
534470 0.161 0.000 0.161 0.000 {isinstance}
534417 0.074 0.000 0.074 0.000 {method 'toordinal' of 'datetime.date' objects}
20 0.032 0.002 0.032 0.002 {numpy.core.multiarray.array}
1 0.000 0.000 0.009 0.009 frequencies.py:716(get_freq)
1 0.000 0.000 0.009 0.009 frequencies.py:708(deltas)
答案 0 :(得分:1)
这是在master / 0.15.0(2014年10月初发布)中修复的。这是我记得的最接近的问题:https://github.com/pydata/pandas/pull/7798。
他们有很多与DST转换检查相关的修补程序(这是问题的根源),请参阅0.15.0 here的新内容。