计算大文本文件的反转

时间:2016-08-03 17:31:10

标签: python mergesort inversion

代码的目标是找出总数。数组的反转。 我的代码成功运行。成功测试了6个元素(所有元素以相反的顺序从最高开始),反转计数= 15。 此外,成功测试了10个元素(所有元素以相反的顺序从最高开始),反转计数= 45 但是,对于包含100k整数的大文件,它需要大约25秒。这是预期的吗?请建议还是我可以进一步降低执行时间? 我刚刚在传统的合并排序算法中做了一个小调整(即计算反转总数的行) 如何进一步缩短整体运行时间?

def mergeSort(final_list):
    global total_count
    if len(final_list)>1:
        mid_no=len(final_list)//2
        left_half=final_list[:mid_no]
        right_half=final_list[mid_no:]

        mergeSort(left_half) 
        mergeSort(right_half)      

        '''Below code is for merging the lists'''
        i=j=k=0 #i is index for left half, j for the right half and k for the resultant list
        while i<len(left_half) and j<len(right_half):
            if left_half[i] < right_half[j]:
                final_list[k]=left_half[i]
                i+=1
                k+=1

            else:
                final_list[k]=right_half[j]

                print 'total count is' 
                print total_count
                #total_count+=len(left_half)-i
                total_count+=len(left_half[i:])
                print 'total_count is '
                print total_count

                print 'pairs are '
                print str(left_half[i:])+' with '+str(right_half[j])
                j+=1
                k+=1




        while i<len(left_half):
            final_list[k]=left_half[i]
            k+=1
            i+=1
        while j<len(right_half):
            final_list[k]=right_half[j]
            j+=1
            k+=1

        '''Code for list merge ends'''

#temp_list=[45,21,23,4,65]
#temp_list=[1,5,2,3,4,6]
#temp_list=[6,5,4,3,2,1]
#temp_list=[1,2,3,4,5,6]
#temp_list=[10,9,8,7,6,5,4,3,2,1]
#temp_list=[1,22,3,4,66,7]
temp_list=[]
f=open('temp_list.txt','r')
for line in f:
    temp_list.append(int(line.strip()))

print 'list is '
print temp_list
print 'list ends'
print temp_list[0]
print temp_list[-1]
'''import time
time.sleep(1000)
print 'hhhhhhhhhh'
'''



total_count=0
mergeSort(temp_list)

print temp_list

1 个答案:

答案 0 :(得分:1)

我找到了它(并通过个人资料验证)

        #total_count+=len(left_half[i:])
        total_count += len(left_half) - i

left_half [i:]在递归函数的主循环中创建一个包含多个元素副本的新列表。这是拼接的巧妙用法,但副作用正在扼杀你的表现。

以下是我如何分解功能以对其进行分析:

def so_merge (final_list, left_half, right_half):
    global total_count
    i=j=k=0 #i is index for left half, j for the right half and k for the resultant list
    while i<len(left_half) and j<len(right_half):
        if left_half[i] < right_half[j]:
            final_list[k]=left_half[i]
            i+=1
            k+=1
        else:
            final_list[k]=right_half[j]
            count1 = get_incriment_bad(left_half, i)
            count2 = get_incriment_good(left_half, i)
            if count1 != count2:
                raise ValueError
            total_count += count1
            j+=1
            k+=1
    finish_left(final_list, left_half, i, k)
    finish_right(final_list, right_half, j, k)

结果显示它花了19.574秒获得len(left_half [i:])

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
199999/1    0.805    0.000   29.562   29.562 week1.py:124(so_mergesort)
99999    7.496    0.000   28.735    0.000 week1.py:104(so_merge)
776644   19.512    0.000   19.574    0.000 week1.py:101(get_incriment_bad)
776644    0.839    0.000    0.895    0.000 week1.py:98(get_incriment_good)
5403164    0.382    0.000    0.382    0.000 {len}
99999    0.273    0.000    0.286    0.000 week1.py:92(finish_right)
99999    0.255    0.000    0.266    0.000 week1.py:86(finish_left)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}