排列已排序列表中的元素,以便所有冗余元素位于列表的左/右侧

时间:2015-01-08 11:12:10

标签: python list

有没有比这更好的方法来解决这个问题呢? def渲染(arr):

single_elements = []  
double_elements = []  

for i in xrange(len(arr)-1):  
    if arr[i] == arr[i+1] or arr[i] == arr[i-1]:  
        double_elements.append(arr[i])  
    else:  
        single_elements.append(arr[i])

if arr[-1] == double_elements[-1]:
    double_elements.append(arr[-1])
else: 
    single_elements.append(arr[-1])

return single_elements+double_elements

arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]

'''输出arr = [1,2,4,5,6,9,3,3,3,7,7,7,7,8,8,8 ]'''

print rendered(arr)

6 个答案:

答案 0 :(得分:3)

你不能在一行中做得很好,但我认为最好保留它O(N)

>>> from itertools import groupby
>>> arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
>>> a, b = [], []
>>> for k, g in groupby(arr):
        group = list(g)
        (a if len(group)<2 else b).extend(group)


>>> print a + b
[1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]

答案 1 :(得分:2)

你的方法是最有效的,你可以使用枚举而不是重复索引进行一些更改来提高效率,使用python 2将运行时间降低15%,使用python 3运行25%:

single_elements = []
double_elements = []
for i, ele in enumerate(arr[:-1], 1):
    if ele == arr[i] or ele == arr[i-2]:
        double_elements.append(ele)
    else:
        single_elements.append(ele)
ele = arr[-1]
if ele == double_elements[-1]:
    double_elements.append(ele)
else:
    single_elements.append(ele)
single_elements.extend(double_elements)

或者如果你想要更少的行:

sin_ele = []
dbl_ele = []
for i, ele in enumerate(arr[:-1], 1):
    dbl_ele.append(ele) if ele == arr[i] or ele == arr[i-2] else sin_ele.append(ele)
ele = arr[-1]
dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
sin_ele.extend(dbl_ele)

一些时间并覆盖一个元素和一个空数组的数组:

def sort_dups(arr):
    if len(arr) < 2:
        return arr
    sin_ele = []
    dbl_ele = []
    for i, ele in enumerate(arr[:-1], 1):
        dbl_ele.append(ele) if ele == arr[i] or ele == arr[i - 2] else sin_ele.append(ele)
    ele = arr[-1]
    dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
    sin_ele.extend(dbl_ele)
    return sin_ele


In [38]: timeit sort_dups(arr)
100000 loops, best of 3: 4.69 µs per loop

In [39]: timeit f(arr)
100000 loops, best of 3: 8.05 µs per loop

In [40]: %%timeit
repeatedElements = []
[num for (i, num) in enumerate(arr[:-1]) if not
      (arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
       repeatedElements.append(num)] + repeatedElements   ....: 
100000 loops, best of 3: 5.38 µs per loop

空单元素列表:

In [74]: sort_dups([1, 2, 2, 3, 5, 5, 5])
Out[74]: [1, 3, 2, 2, 5, 5, 5]

In [75]: sort_dups([1, 1, 1, 1, 2])
Out[75]: [2, 1, 1, 1, 1]

In [76]: sort_dups([])
Out[76]: []    

In [77]: sort_dups([0])
Out[77]: [0]

稍微大一点的输入:

In [59]: arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]

In [60]: timeit f(arr)
100000 loops, best of 3: 14.2 µs per loop

In [61]: timeit sort_dups(arr)
100000 loops, best of 3: 7.81 µs per loop

In [71]: arr+= [None]

In [72]: %%timeit                                                          
repeatedElements = []
[num for (i, num) in enumerate(arr[:-1]) if not
      (arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
       repeatedElements.append(num)] + repeatedElements
   ....: 
100000 loops, best of 3: 10.1 µs per loop

In [93]: %%timeit
a, b = [], []
>>> for i, x in enumerate(arr):
      (b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
   ....: 
10000 loops, best of 3: 14 µs per loop


In [110]:  arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]

In [111]: timeit reorderSequence(arr)
100000 loops, best of 3: 7.85 µs per loop

In [112]: timeit sort_dups(arr)
100000 loops, best of 3: 4.78 µs per loop

In [110]:  arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20]

In [119]: timeit cython_sort_dups(arr)
1000000 loops, best of 3: 1.38 µs per loop

答案 2 :(得分:0)

这是怎么回事?如果需要进一步解释,请告诉我。

arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]

repeatedElements = [i for i in arr if arr.count(i) > 1]
newList = [i for i in arr if i not in repeatedElements] + repeatedElements

print newList

>>> [1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]

答案 3 :(得分:0)

好的,@ jamylak。我举起你。

arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]

arr +=  [None]
repeatedElements = []
print [num for (i, num) in enumerate(arr[:-1]) if not 
      (arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
       repeatedElements.append(num)] + repeatedElements

我用time.clock()测试了这个,一直到10 *,100 *,1000 * arr。它出现的速度比你的速度要快很多(否则速度要小得多),至少在我的机器上是这样。

实际上 ,它也比OP更快。游戏,设置,匹配。

答案 4 :(得分:0)

>>> arr = [1,2,3,3,3,4,5,6,7,7,7,7,8,8,8,9]
>>> a, b = [], []
>>> for i, x in enumerate(arr):
      (b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)

>>> print a + b
[1, 2, 4, 5, 6, 9, 3, 3, 3, 7, 7, 7, 7, 8, 8, 8]

@Eithos我再次举起你

一些测试:

def f(arr):
  a, b = [], []
  for i, x in enumerate(arr):
    (b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
  return a + b

>>> f([1, 2, 2, 3, 5, 5, 5])
[1, 3, 2, 2, 5, 5, 5]
>>> f([1, 1, 1, 1, 2])
[2, 1, 1, 1, 1]
>>> f([])
[]
>>> f([0])
[0]
>>> f([9, 10, 10, 11, 12, 12, 13, 14, 15, 15])
[9, 11, 13, 14, 10, 10, 12, 12, 15, 15]

答案 5 :(得分:0)

(好吧,我知道这很不寻常:三个单独的答案。但对我来说......这似乎是有道理的。)

编辑(1):添加版本A2(增加优化)&gt;&gt;&gt;见下面的测试

编辑(2):增加版本A3(极端优化)&gt;&gt;&gt;见下面的测试

当你们所有人都认为这已经结束并完成时......我已经说过了: 嗯,不能 。< / p>

我举起所有人(@ jamylak,@ Padraic Cunningham,@ Prashant Kumar)。我挑战任何人想出一个更快的算法。如果发生这种情况,我会高兴地承认失败并继续我的生活。在那之前...

算法

我意识到自己太依赖于制作完美的,最少的代码行(部分是因为jamylak,其最后一个算法真的......让我惊叹。我从未想过以这种方式使用三元运算符。所以,真棒。)。

最初,我想出了我的第二个答案的修改版本,因为我想以最少的努力快速超越jamylak,这似乎是这样做的方式。但是,它真的变得那种hacky和不清楚,所以它不是理想的。您不希望同事必须了解在下面开始看起来像算法的算法出了什么问题。

版本1

#...This

def version1Func(arr):

    arr +=  [None]
    repeatedElements = []
    return [num for (i, num) in enumerate(arr[:-1]) if not
        (arr[i] == arr[i+1] or arr[i] == arr[i-1]) or
        repeatedElements.append(num)] + repeatedElements

第2版

# ...became this

def version2Func(arr):

    arr +=  [None]
    repeatedSet = set([])
    repeatedElements = []
    return [num for (i, num) in enumerate(arr[:-1]) if (
        repeatedElements.append(num) if num in repeatedSet else (
        repeatedSet.add(num) or repeatedElements.append(num)) if (
        num in (arr[i+1], arr[i-1])) else True
    )] + repeatedElements

# See? This becomes difficult to understand for all but those intimately 
# familiar with the abuse and hacks that are employed here. Still, it's fairly
# effective and, hopefully, if ever used, shouldn't cause any bugs afaik.

两者都很快。

在测试期间(版本 2.6.4 不是2.7 ,正如我所说的更早。我的错误),第一个比两个人都快,而且Padraic的数据集较小。随着数据的增加,这种差异变得越来越小。差异足够小,以至于Python 3(或其他混淆因素)可能会给另一个算法带来优势(因为它在Padraic的测试中表明他的速度要快一些)。

对于非常小的数据集,第二个版本稍慢,例如&lt; = 50个元素。随着这种情况的增加,差异变得相当明显,因为它真正开始相对(* 1)。在某种程度上,它已经是这里最快算法的一个很好的候选者,因为在决定时间复杂度是一个值得解决的问题时,我们往往更关注大型数据集。

但......继续前进;以下算法是我所做的最快的,在最佳情况下产生的速度几乎是Version 1的两倍。

(* 1)我后来注意到,当Padraic的算法放在函数中时,情况就不再是这样了。它变得更快。现在,Padraic和Version 2似乎再次相提并论。

版本A1

def reorderSequence(seq):

    seqLength = len(seq)
    seqSetLength = len(set(seq))

    if seqSetLength != seqLength and seqLength >= 3:

        indexLength = seqLength - 1
        index = 0
        newList = []
        repeatedList = []
        repeatedNum = 0
        currentItem = 0

        while True:

            if index >= indexLength:
                lastItem = seq[indexLength]
                if lastItem != repeatedList[-1]:
                    newList.append(lastItem)
                return newList + repeatedList

            baseIndex = index
            baseNum = seq[index]

            while True:

                # Checks if the next number in the list is the same and
                # keeps resetting the while loop (with `continue`) until 
                # this condition is no longer True.
                nextItem = seq[index+1]
                if baseNum == nextItem:
                    repeatedNum = nextItem
                    index+=1
                    if index < indexLength:
                        continue
                    else:
                        index+=1
                        break

                # If the previous condition failed, this `if block` will
                # confirm that the current number is a repeat of the last
                # one and set the baseNum to the next number; it will repeat
                # the while loop (with `continue`) because of the possibility
                # that with the next number begins a new series of redundant
                # elements, thereby keeping the collection growing before
                # finally adding it to the 'repeatedList'. But if the next
                # number isn't the beginning of a new series...

                currentItem = seq[index]
                if currentItem == repeatedNum:
                    baseNum = nextItem
                    index+=1
                    if index < indexLength:
                        continue
                    else:
                        break

                else:
                    # .. it will append it to this newList, break
                    # to the outer-While...
                    newList.append(currentItem)
                break

            # ...and, at this point, it will slice the sequence according
            # to the outer-While's baseIndex and inner-While's updated index
            # and extend the repeatedList.
            if baseIndex != index:
                repeatedList.extend(seq[baseIndex:index])

            index+=1
    else:
        return seq

版本A2 - 编辑(1)&gt;&gt;&gt;优化

def reorderSequence(seq):

    seqLength = len(seq)

    if seqLength >= 3:

        indexLength = seqLength - 1
        index = 0
        baseIndex = index
        newList = []
        repeatedList = []
        baseNum = seq[index]
        nextNum = seq[index+1]
        repeated = True if baseNum == nextNum else False

        while True:

            if index >= indexLength:
                return newList + repeatedList

            while repeated:

                if baseNum == nextNum:
                    index+=1
                    if index < indexLength:
                        nextNum = seq[index+1]
                        continue

                index+=1
                if index < indexLength:
                    baseNum = nextNum
                    nextNum = seq[index+1]
                    if baseNum == nextNum:
                        continue
                    else:
                        repeated = False
                else:
                    if baseNum != nextNum:
                        repeated = False

                repeatedList.extend(seq[baseIndex:index])
                baseIndex = index
                break

            while not repeated:

                if baseNum != nextNum:
                    baseNum = nextNum
                    index+=1
                    if index < indexLength:
                        nextNum = seq[index+1]
                        continue
                    else:
                        index+=1

                else:
                    repeated = True

                newList.extend(seq[baseIndex:index])
                baseIndex = index
                break

    else:
        return seq

版本A3 - 编辑(2)&gt;&gt;&gt;极度优化!

def reorderSequence(seq):

    sliceIndex = baseIndex = index = 0
    newList = []
    baseNum = seq[index]
    nextNum = seq[index+1]
    repeated = True if baseNum == nextNum else False

    try:

        while True:

            while repeated:

                if baseNum == nextNum:
                    index+=1
                    nextNum = seq[index+1]
                    continue

                index+=1
                baseNum = nextNum
                nextNum = seq[index+1]
                if baseNum == nextNum:
                    continue
                else:
                    repeated = False

                newList.extend(seq[baseIndex:index])
                baseIndex = index
                break

            while not repeated:

                if baseNum != nextNum:
                    baseNum = nextNum
                    index+=1
                    nextNum = seq[index+1]
                    continue
                else:
                    repeated = True

                newList[sliceIndex:sliceIndex] = seq[baseIndex:index]
                sliceIndex += index - baseIndex
                baseIndex = index
                break

    except IndexError:

        if repeated:

            if seq[-1] == seq[-2]:
                newList.extend(seq[baseIndex:index+1])

            if seq[-1] != seq[-2]:
                newList[sliceIndex] = seq[-1]
                newList.extend(seq[baseIndex:index])

        if not repeated:
            newList[sliceIndex:sliceIndex] = seq[baseIndex:index+1]

        return newList

因为它很明显,在这一点上,我停止了关于想要使代码优雅,简短等的关心。优雅很有趣,但有时必须牺牲最大的果汁优雅。并且,鉴于OP指出他所关心的是效率,更少的线路或不应该计算。

其他答案(供参考)

# jamylak's:
def f(arr):
    a, b = [], []
    for i, x in enumerate(arr):
        (b if (x in arr[i-1:i+2:2] if i > 0 else x in arr[1:2]) else a).append(x)
    return a + b

# Padraic's:
def sort_dups(arr):
    if len(arr) < 2:
        return arr
    sin_ele = []
    dbl_ele = []
    for i, ele in enumerate(arr[:-1], 1):
        dbl_ele.append(ele) if ele == arr[i] or ele == arr[i - 2] else sin_ele.append(ele)
    ele = arr[-1]
    dbl_ele.append(ele) if dbl_ele and ele == dbl_ele[-1] else sin_ele.append(ele)
    sin_ele.extend(dbl_ele)
    return sin_ele

结果(测试1)

# Using this as argument..
arr = sorted([1,2,4,5,6,7,7,7,7,8,8,8,9,12,12,12,14,15,15,15,19,20] * x) 
# In this case, x will be 10000 or 100

# ..and time.clock() as timer:

# jamylak's:
>>> 0.134921406994  +- 0.001     # 10000*
>>> 0.00127404297442             # 100*

# Padraic's:
>>> 0.0626158414828 +- 0.001     # 10000*
>>> 0.000532060143703            # 100*

# Mine - Version 1
>>> 0.0728380523271 +- 0.002     # 10000*
>>> 0.000671155257454            # 100*

# Mine - Version 2
>>> 0.0612159302306 +- 0.001     # 10000*
>>> 0.000565767241821            # 100*

# Mine - Version A1
>>> 0.0519618384449 +- 0.001     # 10000*
>>> 0.000506459816019            # 100*

结果(测试2) - 编辑(1)

# Using the following argument

arr = sorted([1,2,41,52,6,57,7,7,71,8,82,83,9,1244,132,1221,14,15,15,1523,19,20] * 10000 
+ [1,2,2,4,5,42,23,7,1,55,21,23,34,24,26,27,6,31,32,33,61,62,70])

# Padraic's:
0.0614181728193

# Mine - Version A2
0.0403025958732

结果(测试3) - 编辑(2)

# Using same argument as Test 2

# Mine - Version A3
0.0338009659857

做完这些测试后,我意识到一些非常重要的事情。我相信我之前已经学到了一些东西,但却被遗忘了。

在函数内部使用时,所有这些都得到了显着的速度提升。看起来比赛的场地相当平坦。事实上,我说它已经四处被夷为平地。在发布这个之前,我仍然在一个函数中运行我的Version A1算法,并将它与Padraic在全球范围内声明的旧算法进行比较,所以我认为在我开始重新测试每个算法之前我有更大的优势。根据我的数据,我仍然领先,尽管没有我想象的那么多。

关于功能内部所经历的速度提升与它与我的列表组合之间的差距变得越来越窄:我想它可能与列表组合如何非常有效地自行优化有关,而算法在本地声明(内部)一个函数)执行类似的优化。

Lambeau在“善意狩猎”中说道: &#34;所以,让我们这样说:手套被摔下来,但是[我]已经回答,并且有力地回答。&#34;

那么,你怎么说,@ Prashant Kumar?这个答案更加计算效率吗? :)

注意:当然,我欢迎任何人(Padraic?)执行他们自己的测试,看看它是如何在他们的个人机器,Python版本等上运行的。

如果将python版本与结果一起显示也会很好。它可以帮助阐明速度的差异。

编辑1:

我的结论是Version A2可能是我跑得最快的结果。总是有微小的调整空间,但它是否会得到任何显着的提升IMO。我确实有另一个版本,这里提供的巨大数据集作为参数,可以再增加0.002秒。不幸的是,它在开始时略微(并且,我的意思是,非常轻微)慢。因此,当与超大型参数一起使用时,使用小型数据集获得轻微增益时,它似乎不值得利用速度的轻微损失进行交易。

致OP: 我建议你尝试一些这些算法,这样你就可以做出判决。我也非常好奇,看看是否有人可以重现我与@Padraic的一些结果。在那个问题上,我也很想知道这可能与time.clock和timeit之间的差异有关吗?

编辑2:

好。我想我已经耗尽了优化的每一条途径。并且, 它只使用一个列表而不是两个 ,这也更接近OP的目标(他/她有兴趣看看是否可以拉这个关闭而不创建额外的空列表)