如何在Python中优化排序程序?

时间:2016-08-11 08:53:46

标签: python sorting

我需要编写一个程序,它接受包含整数和单词的字符串列表,并返回列表的排序版本。输出应保持原始字符串中出现的字符串和数字的位置。

data=raw_input("Enter data").split(" ")
alpha=[]
num=[]
for item in data:
if item.isalpha():
    alpha.append(item)
else:
    num.append(item)
alpha.sort()
num.sort()

i=0
j=0
result=""
for item in data :
  if item.isalpha():
    result +=alpha[i]+" "
    i +=1
  else:
    result +=num[j]+" "
    j +=1

print result

上面的代码对我来说很好,但我想使用最少的内存。 如何通过单个列表和最少的迭代来减少上述代码并获得正确的结果?

输入

  

汽车卡车8 4巴士6 1

输出

  公共汽车1 4卡车6 8

4 个答案:

答案 0 :(得分:3)

此Python 2 / Python 3代码使用与您类似的算法,但它避免了在构建结果字符串时重新测试数据类型。

它创建一个列表dtypes,用于存储项目的数据类型,作为对目标列表alphanum的引用。这简化了将已排序项目放回正确顺序的过程。

我们使用反向排序,以便我们可以.pop()列出其末尾的所需项目。这比使用.pop(0)更有效,因为从列表前面弹出项目需要在每次弹出时向下移动所有后续项目。

from __future__ import print_function

def parallel_sort(data):
    ''' sort numeric & non-numeric items in str `data` in parallel,
        keeping numeric values in the original numeric slots
        and alpha values in the original alpha slots
    '''
    data = data.split()
    alpha = []
    num = []
    dtypes = [num if item.isdigit() else alpha for item in data]
    for lst, item in zip(dtypes, data):
        lst.append(item)

    alpha.sort(reverse=True)
    num.sort(key=int, reverse=True)
    return ' '.join([lst.pop() for lst in dtypes])

# Test

strings = (
    'car truck 8 4 bus 6 1',
    '9 2 car bus 297',
    'dog ape 1 12 333 emu cat 7 32 zebra bat',
)

for data in strings:
    result = parallel_sort(data)
    print('{!r} -> {!r}'.format(data, result))    

<强>输出

'car truck 8 4 bus 6 1' -> 'bus car 1 4 truck 6 8'
'9 2 car bus 297' -> '2 9 bus car 297'
'dog ape 1 12 333 emu cat 7 32 zebra bat' -> 'ape bat 1 7 12 cat dog 32 333 emu zebra'

这里有一些timeit代码来比较各种算法的速度。对于小字符串,piyush的代码(修改为正确排序数字)是最快的,但对于足够大的字符串,我的代码更快一点。

这些测试是在运行Python 2.6的旧2GHz Pentium 4机器上进行的(我不得不修改Martijn的代码,因为2,6没有字典理解)。

from __future__ import print_function
from timeit import Timer

def parallel_sort_piyush(data):
    data = data.split()
    alpha=[]
    num=[]
    for item in data:
        if item.isalpha():
            alpha.append(item)
        else:
            num.append(item)
    alpha.sort()
    num.sort(key=int)

    i = 0
    j = 0
    result = ""
    for item in data :
        if item.isalpha():
            result += alpha[i] + " "
            i +=1
        else:
            result += num[j] + " "
            j +=1
    return result[:-1]

def parallel_sort_acw1668(data):
    data = data.split()
    alphas = sorted([x for x in data if x.isalpha()])
    numbers = sorted([x for x in data if not x.isalpha()], key=int)
    return ' '.join(alphas.pop(0) if x.isalpha() else numbers.pop(0) for x in data)

def parallel_sort_martijn(line):
    type_map = {}
    words = line.split()
    for i, word in enumerate(words):
        type_map.setdefault(word.isdigit(), []).append(i)

    # sort keys
    # sort digits as numbers (natural sort)
    int_for_digits = lambda w: int(w) if w.isdigit() else w
    # sort specific types to the next position for that type 
    #type_to_pos = lambda w, m={k: iter(v) for k, v in type_map.items()}: next(m[w.isdigit()])
    type_to_pos = lambda w, m=dict((k, iter(v)) for k, v in type_map.items()): next(m[w.isdigit()])
    return ' '.join(sorted(sorted(words, key=int_for_digits), key=type_to_pos))

def parallel_sort_PM2R(data):
    ''' sort numeric & non-numeric items in str `data` in parallel,
        keeping numeric values in the original numeric slots
        and alpha values in the original alpha slots
    '''
    data = data.split()
    alpha = []
    num = []
    dtypes = [num if item.isdigit() else alpha for item in data]
    for lst, item in zip(dtypes, data):
        lst.append(item)

    alpha.sort(reverse=True)
    num.sort(key=int, reverse=True)
    return ' '.join([lst.pop() for lst in dtypes])

funcs = (
    parallel_sort_piyush,
    parallel_sort_acw1668,
    parallel_sort_martijn,
    parallel_sort_PM2R,
)

strings = (
    'car truck 8 4 bus 6 1',
    '9 2 car bus 297',
    'dog ape 1 12 333 emu cat 7 32 zebra bat',
    'only alpha words',
    '42 23 17 5',
    '',
)

def test():
    for parallel_sort in funcs:
        print(parallel_sort.__name__)
        for data in strings:
            result = parallel_sort(data)
            print('{0!r} -> {1!r}'.format(data, result)) 
        print()

def verify():
    for data in strings:
        result = [parallel_sort(data) for parallel_sort in funcs]
        r = result[0]
        ok = all(s == r for s in result[1:])
        print('{0}: {1!r} -> {2!r}'.format(ok, data, r))

# Time tests

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    timings = []
    for func in funcs:
        fname = func.__name__
        setup = 'from __main__ import datastring, ' + fname
        cmd = fname + '(datastring)'
        t = Timer(cmd, setup)
        result = t.repeat(reps, loops)
        result.sort()
        timings.append((result, fname))

    timings.sort()
    for result, fname in timings:
        print('{0:21} {1}'.format(fname, result))

#test()
verify()

reps = 3
loops = 5000
for datastring in strings:
    print('\n{0!r}'.format(datastring))
    time_test(loops, reps)

print('\n' + '- ' * 32)

datastring = ' '.join(strings * 3)
reps = 3
loops = 256
for i in range(7):
    print('\nlength={0}, loops{1}'.format(len(datastring), loops))
    time_test(loops, reps)
    loops >>= 1
    datastring += datastring

<强>输出

True: 'car truck 8 4 bus 6 1' -> 'bus car 1 4 truck 6 8'
True: '9 2 car bus 297' -> '2 9 bus car 297'
True: 'dog ape 1 12 333 emu cat 7 32 zebra bat' -> 'ape bat 1 7 12 cat dog 32 333 emu zebra'
True: 'only alpha words' -> 'alpha only words'
True: '42 23 17 5' -> '5 17 23 42'
True: '' -> ''

'car truck 8 4 bus 6 1'
parallel_sort_piyush  [0.16613292694091797, 0.1678168773651123, 0.17213606834411621]
parallel_sort_PM2R    [0.19424915313720703, 0.19544506072998047, 0.1982269287109375]
parallel_sort_acw1668 [0.26951003074645996, 0.27229499816894531, 0.2791450023651123]
parallel_sort_martijn [0.38483405113220215, 0.39478588104248047, 0.41512084007263184]

'9 2 car bus 297'
parallel_sort_piyush  [0.12851309776306152, 0.1293489933013916, 0.13681578636169434]
parallel_sort_PM2R    [0.16056299209594727, 0.16071605682373047, 0.16141486167907715]
parallel_sort_acw1668 [0.22338008880615234, 0.22396492958068848, 0.22573399543762207]
parallel_sort_martijn [0.31512093544006348, 0.31612205505371094, 0.3207099437713623]

'dog ape 1 12 333 emu cat 7 32 zebra bat'
parallel_sort_piyush  [0.22555994987487793, 0.22738313674926758, 0.2362220287322998]
parallel_sort_PM2R    [0.2644810676574707, 0.26884698867797852, 0.30507016181945801]
parallel_sort_acw1668 [0.34023594856262207, 0.3423771858215332, 0.34470510482788086]
parallel_sort_martijn [0.49398708343505859, 0.49546003341674805, 0.50142598152160645]

'only alpha words'
parallel_sort_piyush  [0.069504022598266602, 0.06974482536315918, 0.077678918838500977]
parallel_sort_PM2R    [0.097023963928222656, 0.10160112380981445, 0.10884809494018555]
parallel_sort_acw1668 [0.16136789321899414, 0.16139507293701172, 0.16254186630249023]
parallel_sort_martijn [0.20757603645324707, 0.20803117752075195, 0.21358394622802734]

'42 23 17 5'
parallel_sort_piyush  [0.12735700607299805, 0.13022804260253906, 0.13068699836730957]
parallel_sort_PM2R    [0.14782595634460449, 0.14879608154296875, 0.14986395835876465]
parallel_sort_acw1668 [0.2091820240020752, 0.21131205558776855, 0.21974492073059082]
parallel_sort_martijn [0.27461814880371094, 0.27850794792175293, 0.27975988388061523]

''
parallel_sort_piyush  [0.024302959442138672, 0.024441957473754883, 0.031994104385375977]
parallel_sort_PM2R    [0.046028852462768555, 0.046576023101806641, 0.046601057052612305]
parallel_sort_acw1668 [0.091669082641601562, 0.091941118240356445, 0.092013120651245117]
parallel_sort_martijn [0.094310998916625977, 0.094748973846435547, 0.095381021499633789]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

length=320, loops256
parallel_sort_PM2R    [0.086880922317504883, 0.087455987930297852, 0.087592840194702148]
parallel_sort_piyush  [0.089211940765380859, 0.089900970458984375, 0.10294389724731445]
parallel_sort_acw1668 [0.1074371337890625, 0.10886883735656738, 0.11089897155761719]
parallel_sort_martijn [0.15668392181396484, 0.15747618675231934, 0.15912413597106934]

length=640, loops128
parallel_sort_PM2R    [0.086150884628295898, 0.088001012802124023, 0.091377019882202148]
parallel_sort_piyush  [0.088989019393920898, 0.089003086090087891, 0.095314979553222656]
parallel_sort_acw1668 [0.10632085800170898, 0.10663104057312012, 0.10766291618347168]
parallel_sort_martijn [0.15318799018859863, 0.1544189453125, 0.1579129695892334]

length=1280, loops64
parallel_sort_PM2R    [0.086312055587768555, 0.08635711669921875, 0.08643794059753418]
parallel_sort_piyush  [0.089561939239501953, 0.089729070663452148, 0.09730219841003418]
parallel_sort_acw1668 [0.10796380043029785, 0.10807299613952637, 0.10920286178588867]
parallel_sort_martijn [0.15214014053344727, 0.15265083312988281, 0.1530609130859375]

length=2560, loops32
parallel_sort_PM2R    [0.086397886276245117, 0.086937904357910156, 0.12731385231018066]
parallel_sort_piyush  [0.090615034103393555, 0.091663837432861328, 0.1024620532989502]
parallel_sort_acw1668 [0.11186099052429199, 0.113922119140625, 0.11545681953430176]
parallel_sort_martijn [0.1525418758392334, 0.15349197387695312, 0.15409398078918457]

length=5120, loops16
parallel_sort_PM2R    [0.086872100830078125, 0.089444875717163086, 0.092289924621582031]
parallel_sort_piyush  [0.09121394157409668, 0.092126131057739258, 0.099750041961669922]
parallel_sort_acw1668 [0.11780095100402832, 0.11782479286193848, 0.11829781532287598]
parallel_sort_martijn [0.1548459529876709, 0.1556861400604248, 0.16153383255004883]

length=10240, loops8
parallel_sort_PM2R    [0.087334871292114258, 0.091704845428466797, 0.092611074447631836]
parallel_sort_piyush  [0.092457056045532227, 0.11381292343139648, 0.11914896965026855]
parallel_sort_acw1668 [0.13423800468444824, 0.14225006103515625, 0.14964199066162109]
parallel_sort_martijn [0.15410614013671875, 0.15437102317810059, 0.15663385391235352]

length=20480, loops4
parallel_sort_PM2R    [0.089828014373779297, 0.089951992034912109, 0.091377973556518555]
parallel_sort_piyush  [0.093550920486450195, 0.093831062316894531, 0.10358881950378418]
parallel_sort_martijn [0.15582108497619629, 0.15685820579528809, 0.15839505195617676]
parallel_sort_acw1668 [0.15901684761047363, 0.15937495231628418, 0.16479396820068359]

这是在Python 3.6上运行的timeit输出;我无法运行Martijn的代码,因为它使用了Python 3不支持的Python 2功能(比较字符串和整数的能力)。

'car truck 8 4 bus 6 1'
parallel_sort_piyush  [0.1639411759988434, 0.1641379140000936, 0.16782489100114617]
parallel_sort_PM2R    [0.19857631000013498, 0.20035489499969117, 0.20133615400118288]
parallel_sort_acw1668 [0.23366880700086767, 0.23590722699918842, 0.23592727899995225]

'9 2 car bus 297'
parallel_sort_piyush  [0.13465033200009202, 0.13776905200029432, 0.18482623500131012]
parallel_sort_PM2R    [0.17675577999943926, 0.17687105299955874, 0.17699695900046208]
parallel_sort_acw1668 [0.1984550399993168, 0.2004171780008619, 0.20442987299975357]

'dog ape 1 12 333 emu cat 7 32 zebra bat'
parallel_sort_piyush  [0.23316595300093468, 0.23489147600048454, 0.23842128900105308]
parallel_sort_PM2R    [0.26679581300049904, 0.3011208970001462, 0.3172619519991713]
parallel_sort_acw1668 [0.3200034309993498, 0.3352665239999624, 0.33631655700082774]

'only alpha words'
parallel_sort_piyush  [0.09549654300099064, 0.09623185599957651, 0.10429198799829464]
parallel_sort_PM2R    [0.13186385899825837, 0.13212396900053136, 0.13451194299886993]
parallel_sort_acw1668 [0.1535412909997831, 0.1543631849999656, 0.15927939099856303]

'42 23 17 5'
parallel_sort_piyush  [0.11825022300035926, 0.11878074699961871, 0.1252167599996028]
parallel_sort_PM2R    [0.1604483920000348, 0.16769106699939584, 0.1691959849995328]
parallel_sort_acw1668 [0.18632163399888668, 0.1896887399998377, 0.1903514539990283]

''
parallel_sort_piyush  [0.02776817599988135, 0.028196225999636226, 0.03495696800018777]
parallel_sort_PM2R    [0.08110263499838766, 0.08155031299975235, 0.08626208599889651]
parallel_sort_acw1668 [0.0864310500001011, 0.09174712499952875, 0.09336608200101182]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

length=320, loops256
parallel_sort_PM2R    [0.07467737899969507, 0.0790010450000409, 0.08554027799982578]
parallel_sort_piyush  [0.08288037499914935, 0.0869976689991745, 0.08999215999938315]
parallel_sort_acw1668 [0.09541546199943696, 0.09584146999986842, 0.10051055599979009]

length=640, loops128
parallel_sort_PM2R    [0.07285131699973135, 0.07332802500059188, 0.0734102949991211]
parallel_sort_piyush  [0.08446464299959189, 0.08626877000097011, 0.0912491470007808]
parallel_sort_acw1668 [0.09565138899961312, 0.09577698600151052, 0.1005053829994722]

length=1280, loops64
parallel_sort_PM2R    [0.0734182439991855, 0.07344354999986535, 0.07376041499992425]
parallel_sort_piyush  [0.08504722700126877, 0.08517580999978236, 0.09426504600014596]
parallel_sort_acw1668 [0.09750029599854315, 0.09771097199882206, 0.098332843001117]

length=2560, loops32
parallel_sort_PM2R    [0.0732510199995886, 0.07328447399959259, 0.0746706619993347]
parallel_sort_piyush  [0.08774417499989795, 0.08785101400098938, 0.09428778500114277]
parallel_sort_acw1668 [0.10173674399993615, 0.103946167999311, 0.11013430999992124]

length=5120, loops16
parallel_sort_PM2R    [0.07310179399974004, 0.07344265099891345, 0.07423899999957939]
parallel_sort_piyush  [0.08817732100033027, 0.0979379299988068, 0.10110497500136262]
parallel_sort_acw1668 [0.10930270000062592, 0.11099402399850078, 0.11111589400024968]

length=10240, loops8
parallel_sort_PM2R    [0.0742019289991731, 0.0743915310013108, 0.08267202100068971]
parallel_sort_piyush  [0.0880410829995526, 0.08827138900051068, 0.09606961099962064]
parallel_sort_acw1668 [0.12271693899856473, 0.1237988149987359, 0.1242337999992742]

length=20480, loops4
parallel_sort_PM2R    [0.07891896799992537, 0.08560944000055315, 0.09119457000088005]
parallel_sort_piyush  [0.08942042499984382, 0.0914211269991938, 0.0983720500007621]
parallel_sort_acw1668 [0.15465029900042282, 0.17178430700005265, 0.1722458230015036]

答案 1 :(得分:2)

您无法排序两次,或在此处为数字位置创建某种映射。分区,排序分区,根据原始单词列表中的类型重新组合(如您所做),或根据类型位置图求助排序单词列表。

要创建类型位置地图,请录制&#39;键入&#39;位置优先(使用str.digit()确定差异)。然后,您可以对所有单词进行排序,而不考虑类型,然后根据类型映射重新排序:

type_map = {}
words = line.split()
for i, word in enumerate(words):
    type_map.setdefault(word.isdigit(), []).append(i)

# sort keys
# sort digits as numbers (natural sort)
int_for_digits = lambda w: int(w) if w.isdigit() else w
# sort specific types to the next position for that type 
type_to_pos = lambda w, m={k: iter(v) for k, v in type_map.items()}: next(m[w.isdigit()])

sorted_line = sorted(sorted(words, key=int_for_digits), key=type_to_pos)

请注意,type_to_pos lambda会创建一个来自&#39; type&#39; (str.isdigit()结果)定位迭代器,这些将在排序后耗尽。每次需要对同一行进行排序时,请确保重新创建lambda。

我还合并了一个自然排序,以确保109之后排序,而不是之前对字符串进行字典排序。

演示:

>>> line = 'car truck 8 4 bus 6 1'
>>> type_map = {}
>>> words = line.split()
>>> for i, word in enumerate(words):
...     type_map.setdefault(word.isdigit(), []).append(i)
...
>>> int_for_digits = lambda w: int(w) if w.isdigit() else w
>>> type_to_pos = lambda w, m={k: iter(v) for k, v in type_map.items()}: next(m[w.isdigit()])
>>> sorted(sorted(words, key=int_for_digits), key=type_to_pos)
['bus', 'car', '1', '4', 'truck', '6', '8']

这与你的内存使用没有真正的区别;两种解决方案都需要创建一些额外的列表(总长度等于单词数量)。

答案 2 :(得分:0)

data = raw_input("Enter data:\n").split()
alphas = sorted([x for x in data if x.isalpha()])
numbers = sorted([x for x in data if not x.isalpha()], key=int)
output = ' '.join(alphas.pop(0) if x.isalpha() else numbers.pop(0) for x in data)
print(output)

答案 3 :(得分:0)

我将代码缩减为单个列表。

<div class='box'>
Tree
<span class='tl_tri tri' onclick='whatever you want the button to do'></span>
<span class='tr_tri tri' onclick='whatever you want the button to do'></span>
<span class='bl_tri tri' onclick='whatever you want the button to do'></span>
<span class='br_tri tri' onclick='whatever you want the button to do'></span>
</div>