在python中为大型数据集本地化时间的快速方法?

时间:2018-04-27 17:52:52

标签: python pandas numpy datetime

我将日期时间对象作为pandas数据帧中的索引,我想在不使用for循环的情况下进行本地化。这是代码:(数据是数据帧)

from pytz import timezone
utc = timezone('UTC')
utc_times = [utc.localize(entry) for entry in data.index]
cst_times = [entry.astimezone(timezone('US/Central')) for entry in utc_times]
data.index = cst_times

随着数据集的增长,这变得缓慢。有什么方法可以加快这个速度吗?

1 个答案:

答案 0 :(得分:2)

如果你的索引是DateTimeIndex,你应该可以这样做:

import pandas as pd
times = pd.DatetimeIndex(start='2018-04-26 11:00:00', periods=50000, freq='1h')
data = pd.DataFrame(index=times)
utc_times = data.index.tz_localize('UTC')
cst_times = utc_times.tz_convert('US/Central')
data.index = cst_times

对于索引为50,000次,此方法的速度提高了1000多倍。见下文:

%% time
# Original method
utc_times = [utc.localize(entry) for entry in data.index]
cst_times = [entry.astimezone(timezone('US/Central')) for entry in utc_times]
data.index = cst_times

CPU times: user 1.28 s, sys: 38.2 ms, total: 1.32 s
Wall time: 1.49 s

-

%%time
# New method
utc_times = data.index.tz_localize('UTC')
cst_times = utc_times.tz_convert('US/Central')
data.index = cst_times

CPU times: user 354 µs, sys: 9 µs, total: 363 µs
Wall time: 389 µs