通过保持每行python的顺序在数组上对字符串排序

时间:2018-10-05 10:24:58

标签: python arrays sorting

我正在尝试根据以下数组的每一行的顺序创建新的字符排序列表

array=np.array([['t', 'c', 'k', 's', 'x', 'f', 'b'],
                ['t', 'c', 'l', 'u', 's', 'z', 'f'],
                ['w', 't', 'l', 'u', 'k', 's', 'n']]

我希望我的新列表应该像['w' 't' 'c' 'l' 'u' 'k' 's'.....]

我的方法是写一个理解列表

myprev=set()
newalpha = [elem for row in array for elem in row if elem not in myprev and (myprev.add(elem) or True)]

但是在我的结果中,顺序不被遵守:在第三行中,w出现在前两个数组的t之前。因此,我希望w必须停留在列表的开头而不是列表中的结尾,如我的结果所示

['t', 'c', 'k', 's', 'x', 'f', 'b', 'l', 'u', 'z', 'w', 'n'] 

1 个答案:

答案 0 :(得分:1)

我相信所要的是:

  1. 将在数组中某处找到的所有字母的集合取为newalpha
  2. 对newalpha进行排序,以便在数组的同一行中找到两个字母时,它们在newalpha中的顺序与在该行中的顺序

正如我在评论中所说,这并不总是可能的(例如,数组= [['a','b'],['b','a']]),如果可以,则不可能一定是一种独特的方式。在下面的代码中,我们根据字符首次出现在哪一行打破联系。如果根本没有解决方案,我不会对这段代码的行为做任何保证。

import numpy as np

array=[['t', 'c', 'k', 's', 'x', 'f', 'b'],
       ['t', 'c', 'l', 'u', 's', 'z', 'f'],
       ['w', 't', 'l', 'u', 'k', 's', 'n']]

# create the set of characters
# initially, this is sorted by the first row that the character is found in
# (and then by order within the row)
chset = set()
chars = list([ch for row in array for ch in row if ch not in chset and (chset.add(ch) or True)])

# array of comparisons
# a 1 in position i, j means chars[i] comes before chars[j]
# a -1 in position i, j means chars[j] comes before chars[i]
# a 0 in position i, j means we don't know yet, or i == j
# we should end with the only zeros being on the diagonal
comparisons = np.zeros((len(chars), len(chars)))


for row in array:
    for i in range(len(row)):
        i_index = chars.index(row[i])
        for j in range(i+1, len(row)):
            j_index = chars.index(row[j])
            comparisons[i_index, j_index] = 1
            comparisons[j_index, i_index] = -1

changes_made = True
while changes_made:
    changes_made = False
    # extend through transitivity:
    # if we know chars[i] is before chars[k] is before chars[j], then chars[i] is before chars[j]
    for i in range(len(chars)):
        for j in range(i + 1, len(chars)):
            if comparisons[i, j] == 0:
                for k in range(len(chars)):
                    if comparisons[i, k] == 1 and comparisons[k, j] == 1:
                        comparisons[i, j] = 1
                        comparisons[j, i] = -1
                        changes_made = True
                        break
                    elif comparisons[i, k] == -1 and comparisons[k, j] == -1:
                        comparisons[i, j] = -1
                        comparisons[j, i] = 1
                        changes_made = True
                        break
    if not changes_made:
        # we've extended transitively as much as we can
        # as a tiebreaker, use the first rows that chars[i] and chars[j] were found in
        # which is the order chars is currently in
        for i in range(len(chars)):
            for j in range(i + 1, len(chars)):
                if comparisons[i, j] == 0:
                    comparisons[i, j] = 1
                    comparisons[j, i] = -1
                    changes_made = True
                    break
            if changes_made:
                break

# convert the table of comparisons into a single number: 
# the first character has -1s everywhere in its row, so gets the lowest score (-11, since there are 12 characters total)
# the second character has -1s everywhere except in the column corresponding to the first character, so gets score -9
# etc
scores = np.sum(comparisons, axis=0)  

# sort chars by score
result = [pair[1] for pair in sorted(enumerate(chars), key=lambda pair: scores[pair[0]])]
print(result)