不同列表中元素的排名

时间:2019-01-06 08:28:16

标签: python python-3.x numpy

我有3个列表,如下所示:

List 1  List 2  List 3
A       A       D
D       D       M
GE      M       A
G       G       S
M       S       G
S       GE     GE

现在,我需要通过平均列表中元素的排名来获得列表中元素的排名,如下所述:

Elements    Rank-List1  Rank-List2  Rank-List3  Average     Ranking
A               1           1           3        1.67          1
D               2           2           1        1.67          2
GE              3           6           6        5             5
G               4           4           5        4.33          4
M               5           3           2        3.33          3
S               6           5           4        5             6

如果“平均值”匹配,则将第一个元素选为较高的排名。

因此,最终的输出列表将为:

Output list
A
D
M
G
GE
S

平均值由Average = Sum of Rank (over all lists) / 3计算得出:

( 1+1+3) / 3 = 1.67 # for A 

这可以通过Python编程实现吗?

3 个答案:

答案 0 :(得分:3)

使用key函数的sorted参数:

list1 = ['A', 'D', 'GE', 'G', 'M', 'S']
list2 = ['A', 'D', 'M', 'G', 'S', 'GE']
list3 = ['D', 'M', 'A', 'S', 'G', 'GE']

sorted(list1, key=lambda elem: sum([list1.index(elem), list2.index(elem), list3.index(elem)]) / 3)

或者,对于列表列表:

lists = [['A', 'D', 'GE', 'G', 'M', 'S'],
         ['A', 'D', 'M', 'G', 'S', 'GE'],
         ['D', 'M', 'A', 'S', 'G', 'GE']]

sorted(lists[0], key=lambda elem: sum(sublist.index(elem) for sublist in lists) / len(lists))

以上两种情况的输出:

['A', 'D', 'M', 'G', 'GE', 'S']

答案 1 :(得分:1)

您可以尝试这样。

>>> import numpy as np
>>> import pandas as pd
>>>
>>> elements = ["A", "D", "GE", "G", "M", "S"]
>>> rank_list1 = [1, 2, 3, 4, 5, 6]
>>> rank_list2 = [1, 2, 6, 4, 3, 5]
>>> rank_list3 = [3, 1, 6, 5, 2, 4]
>>>
>>> df = pd.DataFrame({
...     "Elements": elements,
...     "Rank-List1": rank_list1,
...     "Rank-List2": rank_list2,
...     "Rank-List3": rank_list3,
... })
>>>
>>> df
  Elements  Rank-List1  Rank-List2  Rank-List3
0        A           1           1           3
1        D           2           2           1
2       GE           3           6           6
3        G           4           4           5
4        M           5           3           2
5        S           6           5           4
>>>
>>> df["Average"] = df.apply(lambda s: s[1:].mean(), axis=1)
>>> df
  Elements  Rank-List1  Rank-List2  Rank-List3   Average
0        A           1           1           3  1.666667
1        D           2           2           1  1.666667
2       GE           3           6           6  5.000000
3        G           4           4           5  4.333333
4        M           5           3           2  3.333333
5        S           6           5           4  5.000000
>>>
>>> df["Average"] = df.apply(lambda s: s[1:].mean().round(2), axis=1)
>>> df
  Elements  Rank-List1  Rank-List2  Rank-List3  Average
0        A           1           1           3     1.67
1        D           2           2           1     1.67
2       GE           3           6           6     5.00
3        G           4           4           5     4.33
4        M           5           3           2     3.33
5        S           6           5           4     5.00
>>>
>>> out = df.sort_values(by="Average")
>>> out
  Elements  Rank-List1  Rank-List2  Rank-List3  Average
0        A           1           1           3     1.67
1        D           2           2           1     1.67
4        M           5           3           2     3.33
3        G           4           4           5     4.33
2       GE           3           6           6     5.00
5        S           6           5           4     5.00
>>>
>>> out.Elements
0     A
1     D
4     M
3     G
2    GE
5     S
Name: Elements, dtype: object
>>>
>>> out.Elements.tolist()
['A', 'D', 'M', 'G', 'GE', 'S']
>>>

答案 2 :(得分:1)

Tomothys solution的优化版本:

  

sorted(list1,key = lambda elem:sum([list1.index(elem),list2.index(elem),list3.index(elem)])/ 3)

.index()的每个元素调用list1 3次-每个调用都会迭代各自的列表(针对list1中的每个元素),直到找到出现为止-总的来说,您会得到类似{{1} }的三倍,即sum([1,2,3,4,5,6])(而不是63-见下文)。

我的解决方案的复杂度由18决定,其中O(n)-排序的复杂度可以忽略不计,因为它仅对所有列表中的n = sum(len(item) for item in data) => 18个项目起作用,而列表要小得多。 Timsort complexity需要(最坏的情况)set(),其中O(m*log(m))


m = set(i for sub in data for i in sub) => 6

输出:

from collections import defaultdict

data = [['A', 'D', 'GE', 'G', 'M', 'S'], ['A', 'D', 'M', 'G', 'S', 'GE'],
        ['D', 'M', 'A', 'S', 'G', 'GE']]

d = defaultdict(list) # or int and use /3.0 implicitly

# this loop touches each element once:  O(n) n = sum(length of all lists)
for l in data:
    for idx,value in enumerate(l):
        d[value].append(idx)

# timsort: O(m) to O(m*log(m)) for the much shorter set() over emelents of all lists)  
# sort by score:
result = sorted(d.items(), key= lambda x:sum(x[1])/float(len(x[1]))) 
print( *(r  for r in result), sep="\n") # use 'r[0] for r ..' to just print the names

如果您保证每个子列表包含相同的元素-只是以不同的顺序,您可以进一步简化:

('A', [0, 0, 2])
('D', [1, 1, 0])
('M', [4, 2, 1])
('G', [3, 3, 4])
('GE', [2, 5, 5])
('S', [5, 4, 3])

输出:

d = defaultdict(int) 

# this loop touches each element once:  O(n)
for l in data:
    for idx,value in enumerate(l):
        d[value]+=idx

# there is no sense in dividing the sum by 3 if _all_ sums have to be devided by it

# sort by score:
result = sorted(d.items()) 
print( *(r  for r in result), sep="\n")  

('A', 2) ('D', 2) ('G', 10) ('GE', 12) ('M', 7) ('S', 12) 比普通命令更快-但是,如果您不喜欢导入,则可以针对较慢的速度进行更改

defaultdict

d = {} d.setdefault(key, []).append(value) # defaultdict(list) d.setdefault(key, 0) += value # defaultdict(int) 较慢,因为它总是构造需要花费时间的setdefault(key,default)-defaultdict(...)已优化为不需要它,因此(略)快一些。