我有这样的名单:
a = [('JoN', 12668, 0.0036), ('JeSsIcA', 1268, 0.0536), ('JoN', 1668, 0.00305), ('King', 16810, 0.005)]
b = [('JoN', 12668, 0.0036), ('JON', 16680, 0.00305), ('MeSSi', 115, 0.369)]
我希望结果列表如下:
result = [(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)), (('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)), (('King', 16810, 0.005), None), (None, ('MeSSi', 115, 0.369))]
我尝试过嵌套循环,集合,地图,zip但无法实现此输出。请帮助我。
答案 0 :(得分:2)
首先使用第一个(使用a
)和第三个项目作为键将b
和str.lower()
转换为词典,然后再循环使用列表中键的并集理解得到所需的输出:
>>> from pprint import pprint
>>> dct_a = {(x[0].lower(), x[2]): x for x in a}
>>> dct_b = {(x[0].lower(), x[2]): x for x in b}
>>> out = [(dct_a.get(k), dct_b.get(k)) for k in set(dct_a).union(dct_b)]
>>> pprint(out)
[(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)),
(('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)),
(('King', 16810, 0.005), None),
(('JeSsIcA', 1268, 0.0536), None),
(None, ('MeSSi', 115, 0.369))]
答案 1 :(得分:0)
from string import lower
from itertools import groupby
from operator import itemgetter
def compose(f, g):
def h(*args, **kwargs):
return f(*g(*args, **kwargs))
return h
def lower_first(*args):
return (lower(args[0]),) + args[1:]
sorting_key = compose(lower_first, itemgetter(0, 2, 1))
grouping_key = compose(lower_first, itemgetter(0, 2))
output = [tuple(v) for k,v in groupby(sorted(a+b, key=sorting_key),
key=grouping_key)]
将output
作为
[(('JeSsIcA', 1268, 0.0536),),
(('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)),
(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)),
(('King', 16810, 0.005),),
(('MeSSi', 115, 0.369),)]
然后添加None
值非常简单:
final_output = [ elem if len(elem) >= 2
else ((None,)+ elem) if elem[0] not in a else elem + (None,)
for elem in output
]
给出:
[(('JeSsIcA', 1268, 0.0536), None),
(('JoN', 1668, 0.00305), ('JON', 16680, 0.00305)),
(('JoN', 12668, 0.0036), ('JoN', 12668, 0.0036)),
(('King', 16810, 0.005), None),
(None, ('MeSSi', 115, 0.369))]
但是你需要小心,因为用列表来陈述这样的问题通常会掩盖关系连接的问题,这些问题会由具有适当索引的系统来处理,例如pandas.DataFrame
似乎更有可能由于其native join
and merge
capabilities.