通过一个元组的第一个元素和另一个元组的第二个元素的相等性来排序元组列表

时间:2018-02-16 15:06:10

标签: python optimization

我有一个代表点(x, y)的元组列表,并希望对它们进行排序,以便点x_i的{​​{1}}等于另一个点的p_i y_j 1}}。这些点使得x和y在点之间不会重复,例如,给定点(1,2),不允许任何x和y的点(1,y)或(x,2)。例如:

p_j

应按points = [(1, 5), (3, 4), (5, 3), (4, 1), (7,2), (2, 6)] # valid points

订购

这是我写的代码:

[(1, 5), (5, 3), (3, 4), (4, 1), (7, 2), (2, 6)]

不幸的是,这种情况的复杂性是O(N ^ 2),对于大量的积分,它很慢。有没有办法更快地做到这一点?

4 个答案:

答案 0 :(得分:2)

将无序列表视为有向图的描述,其中每个节点都在一个独特的链中,您可以使用以下抽象。

points = [(1, 5), (3, 4), (5, 3), (4, 1), (7,2), (2, 6)]

# Create the graph and initialize the list of chains
graph, chains, seen = dict(points), [], set()

# Find the chains in the graph
for node, target in graph.items():
    while node not in seen:
        seen.add(node)
        chains.append((node, target))
        node = target
        try:
            target = graph[target]
        except KeyError:
            break

# chains : [(1, 5), (5, 3), (3, 4), (4, 1), (7, 2), (2, 6)]

这为我们提供了一个在 O(n)中运行的算法。

答案 1 :(得分:1)

您可以通过缓存具有相同第一项的点列表来将搜索转换为O(1)时间。 (并且缓存是O(N)时间。)执行此操作的代码有点棘手,主要是跟踪哪些项目已经处理过,但它应该很快就能运行。这是一个例子:

from collections import defaultdict, deque

points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)]

# make a dictionary of lists of points, grouped by first element
cache = defaultdict(deque)
for i, p in enumerate(points):
    cache[p[0]].append(i)

# keep track of all points that will be processed
points_to_process = set(range(len(points)))

i = 0
next_idx = i
ordered_points = []
while i < len(points):
    # get the next point to be added to the ordered list
    cur_point = points[next_idx]
    ordered_points.append(cur_point)
    # remove this point from the cache (with popleft())
    # note: it will always be the first one in the corresponding list;
    # the assert just proves this and quietly consumes the popleft()
    assert next_idx == cache[cur_point[0]].popleft()
    points_to_process.discard(next_idx)
    # find the next item to add to the list
    try:
        # get the first remaining point that matches this
        next_idx = cache[cur_point[1]][0]
    except IndexError:
        # no matching point; advance to the next unprocessed one
        while i < len(points):
            if i in points_to_process:
                next_idx = i
                break
            else:
                i += 1

ordered_points
# [(1, 5), (5, 3), (3, 4), (4, 1), (1, 6), (7, 2), (2, 3), (3, 4)]

您可以避免创建points_to_process设置以节省内存(可能还有时间),但代码会变得更复杂:

from collections import defaultdict, deque

points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)]

# make a dictionary of lists of points, grouped by first element
cache = defaultdict(deque)
for i, p in enumerate(points):
    cache[p[0]].append(i)

i = 0
next_idx = i
ordered_points = []
while i < len(points):
    # get the next point to be added to the ordered list
    cur_point = points[next_idx]
    ordered_points.append(cur_point)
    # remove this point from the cache
    # note: it will always be the first one in the corresponding list
    assert next_idx == cache[cur_point[0]].popleft()
    # find the next item to add to the list
    try:
        next_idx = cache[cur_point[1]][0]
    except IndexError:
        # advance to the next unprocessed point
        while i < len(points):
            try:
                # see if i points to an unprocessed point (will always be first in list)
                assert i == cache[points[i][0]][0]
                next_idx = i
                break
            except (AssertionError, IndexError) as e:
                # no longer available, move on to next point
                i += 1

ordered_points
# [(1, 5), (5, 3), (3, 4), (4, 1), (1, 6), (7, 2), (2, 3), (3, 4)]

答案 2 :(得分:1)

感谢大家的帮助。这是我自己的解决方案,使用numpy和while循环(比Matthias Fripp的解决方案慢很多,但比问题代码中使用两个for循环更快):

# example of points
points = [(1, 5), (17, 2),(3, 4), (5, 3), (4, 1), (6, 8), (9, 7), (2, 6)]  

points = np.array(points)
x, y = points[:,0], points[:,1]

N = points.shape[0]
i = 0
idx = [0]
remaining = set(range(1, N))
while len(idx) < N: 
    try:
        i = np.where(x == y[i])[0][0]
        if i in remaining:
            remaining.remove(i)
        else:
            i = remaining.pop()
    except IndexError:
        i = remaining.pop()

    idx.append(i)

list(zip(points[idx][:,0], points[idx][:,1]))
# [(1, 5), (5, 3), (3, 4), (4, 1), (17, 2), (2, 6), (6, 8), (9, 7)]

答案 3 :(得分:0)

递归的分而治之的方法可能有更好的运行时间。由于这不是一个简单的排序问题,你不能只是将修改后的快速排序或其他任何东西放在一起。我认为一个好的解决方案是合并算法。这是一些可能有用的伪代码。

let points = [(1, 5), (3, 4), (5, 3), (4, 1), (1,6), (7,2), (3,4), (2,3)];
function tupleSort(tupleList):
    if length(tupleList) <= 1:
        return tupleList
    if length(tupleList) == 2:
        //Trivial solution. Only two tuples in the list. They are either
        //swapped or left in place
        if tupleList[0].x == tupleList[1].y
            return reverse(tupleList)
        else:
            return tupleList
    else:
        let length = length(tupleList)
        let firstHalf = tupleSort(tupleList[0 -> length/2])
        let secondHalf = tupleSort(tupleList[length/2 + 1 -> length])
        return merge(firstHalf, secondHalf) 

function merge(firstList, secondList):
    indexOfUnsorted = getNotSorted(firstList)
    if indexOfUnsorted > -1:
        //iterate through the second list and find a x item 
        //that matches the y of the first list and put the
        //second list into the first list at that position
        return mergedLists
    else:
        return append(firstList, secondList)

function getNotSorted(list):
     //iterate once through the list and return -1 if sorted
     //otherwise return the index of the first item whose y value
     //is not equal to the next items x value