应用错误收集

自编的python代码。（现在有了MCVE！）

时间：2016-03-11 00:42:43

标签： python optimization compiler-optimization

我有一个用python编写的程序，用户提供命令行参数来说明应该在某些数据上处理哪些统计数据。

最初，我编写的代码可以在X组合中获取N个统计数据并计算结果 - 但是，我发现如果我为自己编写代码来执行特定的统计组合，那么它总是要快得多。然后我编写了代码，如果我手工完成，我将编写python，而exec（）使用它，这非常有效。理想情况下，我想找到一种方法来获得与python重写循环时相同的性能，但是以某种方式执行它不需要我的所有函数都是字符串!!

以下代码是显示问题的最小完整可验证示例。

import time
import argparse
import collections

parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter,
    description="Demonstration that it is sometimes much faster to use exec() than to not.")
parser.add_argument("--stat", nargs='+', metavar='', action='append',
    help='Supply a list of stats to run here. You can use --stat more than once to make multiple groups.')
args = parser.parse_args()

allStats = {}
class stat1:
    def __init__(self):
        def process(someValue):
            return someValue**3
        self.calculate = process
allStats['STAT1'] = stat1()

class stat2:
    def __init__(self):
        def process(someValue):
            return someValue*someValue
        self.calculate = process
allStats['STAT2'] = stat2()

class stat3:
    def __init__(self):
        def process(someValue):
            return someValue+someValue
        self.calculate = process
allStats['STAT3'] = stat3()

allStatsString = {}
allStatsString['STAT1'] = 'STAT1 = someValue**3'
allStatsString['STAT2'] = 'STAT2 = someValue*someValue'
allStatsString['STAT3'] = 'STAT3 = someValue+someValue'

stats_to_run = set()                                                   # stats_to_run is a set of the stats the user wants to run, irrespective of grouping.
data = [collections.defaultdict(int) for x in range(0,len(args.stat))] # data is a list of dictionaries. One dictionary for each --stat group.
for group in args.stat:
    stats_to_run.update(group)
    for stat in group:
        if stat not in allStats.keys():
            print "I'm sorry Dave, I'm afraid I can't do that."; exit()

loops = 9000000
option = 1
startTime = time.time()
if option == 1:
    results = dict.fromkeys(stats_to_run)
    for someValue in xrange(0,loops):
        for analysis in stats_to_run:
            results[analysis] = allStats[analysis].calculate(someValue)
        for a, analysis in enumerate(args.stat):
            data[a][tuple([ results[stat] for stat in analysis ])] += 1

elif option == 2:
    for someValue in xrange(0,loops):
        STAT1 = someValue**3
        STAT2 = someValue*someValue
        STAT3 = someValue+someValue        
        data[0][(STAT1,STAT2)] += 1  # Store the first result group
        data[1][(STAT3,)] += 1       # Store the second result group

else:
    execute = 'for someValue in xrange(0,loops):'
    for analysis in stats_to_run:
        execute += '\n    ' + allStatsString[analysis]
    for a, analysis in enumerate(args.stat):
        if len(analysis) == 1: 
            execute += '\n    data[' + str(a) + '][('+ analysis[0] + ',)] += 1'
        else: 
            execute += '\n    data[' + str(a) + '][('+ ','.join(analysis) + ')] += 1'
    print execute
    exec(execute)

## This bottom bit just adds all these numbers up so we get a single value to compare the different methods with (to make sure they are the same)
total = 0
for group in data:
    for stats in group:
        total += sum(stats)
print total
print time.time() - startTime

如果使用参数python test.py --stat STAT1 STAT2 --stat STAT3执行脚本，则平均：

选项1需要92秒
选项2需要56秒
选项3需要54秒（这并不奇怪，因为它基本上与上面相同）。

如果参数变得更复杂，例如“--stat STAT1 --stat STAT2 --stat STAT3 --stat STAT1 STAT2 STAT3”或循环次数上升，自内联代码与常规python代码变得越来越宽，越来越宽：
选项1需要393s
选项3需要190秒

通常我的用户会做50到1亿个循环，可能有3个组，每组有2到5个统计数据。统计数据本身并不简单，但计算时间的差异是几小时。

1 个答案:

答案 0 :(得分：1)

我认为你只是想避免重复计算相同的统计数据。试试这个。请注意，我使用# Homebrew PHP CLI export PATH="$(brew --prefix homebrew/php/php56)/bin:$PATH"，因此我使用逗号分隔列表。你已经以某种方式弄清楚了，但不要告诉我们如何，所以不要担心 - 它并不重要。我建立一组统计名称的docopt中的代码可能是关键。

parse_args