优化列表和csv阅读python

时间:2014-06-20 07:10:51

标签: python optimization

我试图优化经历循环459*458*23次的python脚本。

目前该脚本大约需要2天。

这是脚本:

for i in range(0, len(file_names)):
        for q in range(0, len(original_tuples)):
            for j in range(0, len(original_tuples)):
                cur_freq = int(original_tuples[j][0])
                cur_clamp = int(original_tuples[j][1])
                freq_num = int(raw_tuples[j][0])
                clamp_num = int(raw_tuples[j][1])
                perf  = str(cur_freq) + "/" + str(cur_freq)+ "-" + str(cur_clamp) + "/" + file_names[i] + "-perf.csv"
                power = str(cur_freq) + "/" + str(cur_freq)+ "-" + str(cur_clamp) + "/" + file_names[i] + "-power.csv"
                dataset = r_script.parse_files(perf,power)
                a, b,c,d,e,f,g,h,i =  r_script.avg(dataset)
                s = freq_logs[freq_num][0]%(a,b,c,d,h,e,g,f)
                index = s.find('=')+1
                predicted = float(eval(s[index:]))
                switching_power[i][q][freq_num][clamp_num].append(float(predicted))
                real_power[i][freq_num][clamp_num].append(float(i))

                for k in range(0, len(possible_frequency)):
                    if int(possible_frequency[k]) != int(cur_freq):
                        temp_freq  = int(possible_frequency[k])
                        temp_clamp = clamp_num
                        temp_freq_num  = possible_frequency.index(possible_frequency[k])
                        perf1  = str(temp_freq) + "/" + str(temp_freq)+ "-" + str(temp_clamp) + "/" + file_names[i] + "-perf.csv"
                        power1 = str(temp_freq) + "/" + str(temp_freq)+ "-" + str(temp_clamp) + "/" + file_names[i] + "-power.csv"
                        dataset1 = r_script.parse_files(perf1,power1)
                        a1, b1,c1,d1,e1,f1,g1,h1,i1 =  r_script.avg(dataset1)
                        s = freq_logs[temp_freq_num][0]%(a,b,c,d,h,e,g,f)
                        index = s.find('=')+1
                        predicted = float(eval(s[index:]))
                        switching_power[i][q][temp_freq_num][temp_clamp].append(float(predicted))

                for l in range(0, len(possible_frequency)):
                    for m in range(0, len(clamp_range)):
                        if int(clamp_range[m]) != int(cur_clamp):
                            cl_temp_freq  = int(possible_frequency[l])
                            cl_temp_clamp = int(clamp_range[m])
                            cl_temp_freq_num  = int(possible_frequency.index(possible_frequency[l]))
                            cl_temp_clamp_num = int(clamp_range.index(clamp_range[m]))
                            if (cl_temp_clamp_num != cl_temp_clamp):
                                sys.exit("buggy...clamp # not matching")

                            perf2  = str(cl_temp_freq) + "/" + str(cl_temp_freq)+ "-" + str(cl_temp_clamp) + "/" + file_names[i] + "-perf.csv"
                            power2 = str(cl_temp_freq) + "/" + str(cl_temp_freq)+ "-" + str(cl_temp_clamp) + "/" + file_names[i] + "-power.csv"
                            dataset2 = r_script.parse_files(perf2,power2)
                            a2, b2,c2,d2,e2,f2,g2,h2,i2 =  r_script.avg(dataset2)
                            previous_predicted_power = switching_power[i][q][cl_temp_freq_num][temp_clamp][0]
                            clamper = float(temp_clamp)/float(cl_temp_clamp_num)
                            s = clamp_logs[temp_freq_num][0]%(previous_predicted_power, clamper)
                            index = s.find('=')+1
                            predicted = float(eval(s[index:]))
                            switching_power[i][q][temp_freq_num][temp_clamp].append(float(predicted))

    for n in range(0, len(file_names)):
        for fo in range(0, len(original_tuples)):
            for o in range(0, len(original_tuples)):
                freq_num = int(raw_tuples[o][0])
                clamp_num = int(raw_tuples[o][1])
                diff_power[n][fo][freq_num][clamp_num] = float(float(real_power[n][freq_num][clamp_num][0])-float(switching_power[n][fo][freq_num][clamp_num][0]))

以下是清单:

possible_clamp_levels = int(len(clamp_range)*len(possible_frequency))
original_tuples = []
raw_tuples = []
switching_power = [[[[[] for d in range(0, len(clamp_range))] for c in range(0, len(possible_frequency))] for b in range(0, possible_clamp_levels)] for a in range(0, len(file_names))]
diff_power = [[[[[] for d in range(0, len(clamp_range))] for c in range(0, len(possible_frequency))] for b in range(0, possible_clamp_levels)] for a in range(0, len(file_names))]
real_power = [[[[] for d in range(0, len(clamp_range))]for c in range(0, len(possible_frequency))] for a in range(0, len(file_names))]

for a in range(0, len(possible_frequency)):
    for b in range(0, len(clamp_range)):
        test = (possible_frequency[a], clamp_range[b])
        test1 = (a,b)
        original_tuples.append(test)
        raw_tuples.append(test1)

如果您需要有关脚本本身的任何指示以帮助我优化它,请告诉我。 Freq_logsclamp_logs基本上是线性方程替换。 r_script是另一个读取这些csv文件的脚本。解析它需要不到10ms

1 个答案:

答案 0 :(得分:0)

使用索引将迭代转换为可迭代的实际迭代

变化

for i in range(len(lst):
    item = lst[i]
    print item #do something useful here

for itm in lst:
    print itm # do something useful here

如果您确实需要知道当前处理项目的索引,请使用enumerate

for i, itm in enumerate(lst):
    print itm # do something useful here

你的代码看起来更像Pythonic(你可能会获得一点速度)

分而治之 - 多处理

如果您可以重新设计解决方案,可以参考,例如处理多个组中的数据,最后合并结果。这假定:

  • 可以将作业分成部分并稍后合并结果
  • I / O不是限制(如果您受限于从单个磁盘读取,map - reduce方法不会加快速度。)