具有pandas数据帧输入的Python3函数运行速度慢于第一次,第二次运行速度更快

时间:2018-01-03 22:26:14

标签: python python-3.x performance function pandas

我构建了2个函数major_check_with_dataframe和major_check_with_list,我想看看哪个运行得更快。我对他们的运行时间感到困惑。

import numpy as np
DTYPE_FLOAT = np.float
import anapyfunc.major
import pandas as pd
from testpython.timer import timer

# wrapper function
def major_check(**kwargs):
    #return major_check_with_list(**kwargs)
    return major_check_with_dataframe(**kwargs)

# functions to be timed
def major_check_with_dataframe(df, s_major_single_hi_percent, s_major_single_lo_percent):
    ...

def major_check_with_list(source_list, s_major_single_hi_percent, s_major_single_lo_percent):
    ...

# main function starts here
t = timer.Timer(verbose = True, run = False)
t.set_name(name = 'major check timer')

a = np.random.choice(101, 2500)
b = np.random.choice(101, 2500)
c = np.random.choice(101, 2500)

s_major_single_hi_percent = 70 
s_major_single_lo_percent = 10

dd = {'a' : a, 'b' : b , 'c' : c}
df = pd.DataFrame(dd)

# axis 0 = tick
# axis 1 = input arrays

t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check_with_list')
t.start()
ret2 = anapyfunc.major.major_check_with_list(
                                source_list = [a, b, c,],
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                )
t.stop_reset()


t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
                                  #source_list = [a, b, c,],
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

major_check调用major_check_with_dataframe

时的输出
major_check_with_dataframe elapsed time: 94.261000 ms
major_check_with_list elapsed time: 2.316000 ms
major_check elapsed time: 3.055000 ms

major_check调用major_check_with_list

时的输出
major_check_with_dataframe elapsed time: 95.042000 ms
major_check_with_list elapsed time: 2.240000 ms
major_check elapsed time: 2.240000 ms

我发现如果我第二次运行major_check_with_dataframe,它的运行时间会减少到与通过major_check包装函数运行时几乎相同。

t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check_with_list')
t.start()
ret2 = anapyfunc.major.major_check_with_list(
                                source_list = [a, b, c,],
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                )
t.stop_reset()


t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
                                  #source_list = [a, b, c,],
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check_with_dataframe')
t.start()
ret1 = anapyfunc.major.major_check_with_dataframe(
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

t.set_name(name = 'major_check')
t.start()
ret3 = anapyfunc.major.major_check(
                                  #source_list = [a, b, c,],
                                  df = df, 
                                  s_major_single_hi_percent = s_major_single_hi_percent,
                                  s_major_single_lo_percent = s_major_single_lo_percent,
                                 )
t.stop_reset()

输出

major_check_with_dataframe elapsed time: 95.608000 ms
major_check_with_list elapsed time: 2.350000 ms
major_check elapsed time: 3.048000 ms
major_check_with_dataframe elapsed time: 2.569000 ms
major_check elapsed time: 2.520000 ms

它可以是某种内存缓存效果吗? 即使我将函数放在一个类中,使用类对象运行一次函数,并在每次运行后删除/垃圾收集它,这种行为也是一样的。

所有函数都正确返回预期值。 我在这里错过了什么? 我使用的版本是:

Python 3.4.3(默认,2016年11月17日,01:08:31)

pandas 0.21.0

0 个答案:

没有答案