maintaining hierarchically sorted lists in python

时间:2016-02-12 20:56:45

标签: python arrays list sorting numpy

I'm not sure if 'hierarchical' is the correct way to label this problem, but I have a series of lists of integers that I'm intending to keep in 2D numpy array that I need to keep sorted in the following way:

list_of_things = [('name', 'lo', 'hi', 'step'),
                  ('name2', 'lo2', 'hi2'), 
                  ('name3', 'lo3', 'hi3', 'step3')
]
def myFunc(name, lo, hi, step='default'):
                              # do something
def callMyFunc(x):    
    for i in x:
        myFunc(*i)            # pass the contents of the tuple as arguments

So the first list is sorted, then the second list is broken into subsections of elements which all have the same value in the first list and those subsections are sorted, and so on down all the lists.

Initially each list will contain only one integer, and I'll then receive new columns that I need to insert into the array in such a way that it remains sorted as discussed above.

The purpose of keeping the lists in this order is that if I'm given a new column of integers I need to check whether an exact copy of that column exists in the array or not as efficiently as possible, and I assume this ordering will help me do it. It may be that there is a better way to make that check than keeping the lists like this - if you have thoughts about that please mention them!

I assume the correct position for a new column can be found by a series of binary searches but my attempts have been messy - any thoughts on doing this in a tidy and efficient way?

thanks!

1 个答案:

答案 0 :(得分:0)

如果我正确理解您的问题,您需要处理一系列数字序列,但您需要能够判断最新的数字序列是否与您处理的其中一个序列重复之前。目前你正在尝试将新序列作为numpy数组中的列插入,但这很尴尬,因为numpy对于固定大小的数组来说真的是最好的(连接或插入的东西总是很慢)。 / p>

根据您的需求,更好的数据结构是set。成员资格测试和set上新项目的添加都非常快(摊销O(1)时间复杂度)。唯一的限制是set项必须是可清除的(tuple是真的,但list或numpy数组则不然。

以下是您可以使用的一些代码的大纲:

seen = set()
for seq in sequences:
    tup = tuple(sequence) # you only need to make a tuple if seq is not already hashable
    if tup not in seen:
        seen.add(tup)

        # do whatever you want with seq here, it has not been seen before

    else:
        pass # if you want to do something with duplicated sequences, do it here

您还可以查看the itertools documentation中的unique_everseen食谱,该食谱与上述内容基本相同,但作为优化后的生成器函数。