我构建了2个函数major_check_with_dataframe和major_check_with_list,我想看看哪个运行得更快。我对他们的运行时间感到困惑。
import numpy as np
DTYPE_FLOAT = np.float
import anapyfunc.major
import pandas as pd
from testpython.timer import timer
# wrapper function
def major_check(**kwargs):
#return major_check_with_list(**kwargs)
return major_check_with_dataframe(**kwargs)
# functions to be timed
def major_check_with_dataframe(df, s_major_single_hi_percent, s_major_single_lo_percent):
...
def major_check_with_list(source_list, s_major_single_hi_percent, s_major_single_lo_percent):
...
# main function starts here
t = timer.Timer(verbose = True, run = False)
t.set_name(name = 'major check timer')
a = np.random.choice(101, 2500)
b = np.random.choice(101, 2500)
c = np.random.choice(101, 2500)
s_major_single_hi_percent = 70
s_major_single_lo_percent = 10
dd = {'a' : a, 'b' : b , 'c' : c}
df = pd.DataFrame(dd)
# axis 0 = tick
# axis 1 = input arrays
t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
df = df,
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
t.set_name(name = 'major_check_with_list')
t.start()
ret2 = anapyfunc.major.major_check_with_list(
source_list = [a, b, c,],
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
#source_list = [a, b, c,],
df = df,
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
major_check调用major_check_with_dataframe
时的输出major_check_with_dataframe elapsed time: 94.261000 ms
major_check_with_list elapsed time: 2.316000 ms
major_check elapsed time: 3.055000 ms
major_check调用major_check_with_list
时的输出major_check_with_dataframe elapsed time: 95.042000 ms
major_check_with_list elapsed time: 2.240000 ms
major_check elapsed time: 2.240000 ms
我发现如果我第二次运行major_check_with_dataframe,它的运行时间会减少到与通过major_check包装函数运行时几乎相同。
t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
df = df,
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
t.set_name(name = 'major_check_with_list')
t.start()
ret2 = anapyfunc.major.major_check_with_list(
source_list = [a, b, c,],
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
#source_list = [a, b, c,],
df = df,
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
df = df,
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
#source_list = [a, b, c,],
df = df,
s_major_single_hi_percent = s_major_single_hi_percent,
s_major_single_lo_percent = s_major_single_lo_percent,
)
t.stop_reset()
输出
major_check_with_dataframe elapsed time: 95.608000 ms
major_check_with_list elapsed time: 2.350000 ms
major_check elapsed time: 3.048000 ms
major_check_with_dataframe elapsed time: 2.569000 ms
major_check elapsed time: 2.520000 ms
它可以是某种内存缓存效果吗? 即使我将函数放在一个类中,使用类对象运行一次函数,并在每次运行后删除/垃圾收集它,这种行为也是一样的。
所有函数都正确返回预期值。 我在这里错过了什么? 我使用的版本是:
Python 3.4.3(默认,2016年11月17日,01:08:31)
pandas 0.21.0