Question

我想对这样的元组列表进行排序：

rows = [ ('A', 'a', 1, '?'),
     ('A', 'a', 1, '!'),
     ('A', 'a', 1, '#'),
     ('A', 'b', 1, '#'),
     ('A', 'b', 2, '$'),
     ('A', 'c', 2, '@'),
     ('A', 'd', 3, '@') ]

通过这种频率模式：

- we have 1 value 'A' at index [0]
- we have 4 values 'a', 'b', 'c', 'd' at index [1]
- we have 3 values 1,2,3 at index [2]
- we have 5 values '?', '!', '#', '$', '@' at index[3]

所以，排序列表看起来应该是这样的：

rows = [ ('A', 1, 'a', '?'),
     ('A', 1, 'a', '!'),
     ('A', 1, 'a', '#'),
     ('A', 1, 'b', '#'),
     ('A', 2, 'b', '$'),
     ('A', 2, 'c', '@'),
     ('A', 3, 'd', '@') ]

如何优雅地做到这一点？

Answer 1

Transpose your rows to columns，按照设定的长度（唯一计数）排序，然后重新转置：

zip(*sorted(zip(*rows), key=lambda c: len(set(c))))

zip(*nested_list)返回nested_list中所有行的列，前提是这些行的长度都相同（如果任何列表比其他行短，则忽略其余列）。

这会将第二列向左移动，因为它具有更多唯一值。

演示：

>>> rows = [ ('A', 'a', 1, '?'),
...      ('A', 'a', 1, '!'),
...      ('A', 'a', 1, '#'),
...      ('A', 'b', 1, '#'),
...      ('A', 'b', 2, '$'),
...      ('A', 'c', 2, '@'),
...      ('A', 'd', 3, '@') ]
>>> zip(*sorted(zip(*rows), key=lambda c: len(set(c))))
[('A', 1, 'a', '?'), ('A', 1, 'a', '!'), ('A', 1, 'a', '#'), ('A', 1, 'b', '#'), ('A', 2, 'b', '$'), ('A', 2, 'c', '@'), ('A', 3, 'd', '@')]
>>> from pprint import pprint
>>> pprint(_)
[('A', 1, 'a', '?'),
 ('A', 1, 'a', '!'),
 ('A', 1, 'a', '#'),
 ('A', 1, 'b', '#'),
 ('A', 2, 'b', '$'),
 ('A', 2, 'c', '@'),
 ('A', 3, 'd', '@')]

Answer 2

如果您愿意/有兴趣通过pandas库进行此操作，请参阅下文。我个人投票认为这完全不如使用zip和sorted（带密钥）解决方案，或者可能使用collections.Counter，但它仍然存在。

df = pandas.DataFrame(rows).sort([0, 1, 2], ascending=(1, 1, 1))
col_order = df.apply(lambda x: x.nunique()).argsort().values.tolist()
map(tuple, df[col_order].values.tolist())

E.g：

In [30]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:df = pandas.DataFrame(rows).sort([0, 1, 2], ascending=(1, 1, 1))
:col_order = df.apply(lambda x: x.nunique()).argsort().values.tolist()
:map(tuple, df[col_order].values.tolist())
:--
Out[30]: 
[('A', 1, 'a', '?'),
 ('A', 1, 'a', '!'),
 ('A', 1, 'a', '#'),
 ('A', 1, 'b', '#'),
 ('A', 2, 'b', '$'),
 ('A', 2, 'c', '@'),
 ('A', 3, 'd', '@')]

Python按频率排序元组列表

2 个答案: