在python中合并n个已排序的元组列表

时间:2011-02-15 04:46:17

标签: python algorithm sorting tuples merge

我有n个列表(n< 10)的元组格式为[(ListID,[(索引,值),(索引,值),...)]并希望按索引对它们进行排序以获得以下结果

Example Input:
[('A',[(0.12, 'how'),(0.26,'are'),(0.7, 'you'),(0.9,'mike'),(1.9, "I'm fine too")]),
('B',[(1.23, 'fine'),(1.50, 'thanks'),(1.6,'and you')]),
('C',[(2.12,'good'),(2.24,'morning'),(3.13,'guys')])]

Desired Output:
[('A', ( 0.12, 'how')),
('A', ( 0.26, 'are')),
('A', ( 0.7, 'you')),
('A', ( 0.9, 'mike')),
('B',(1.23, 'fine')),
('B',(1.50, 'thanks')),
('B',(1.6,'and you')),
('A', (1.9, "I'm fine too")),
('C',(2.12,'good')),
('C',(2.24,'morning')),
('C',(3.13,'guys'))]   

我知道代码是丑陋的,特别是那些索引项[0] [ - 1] [1],但是有人可以告诉我我做错了什么吗?

content = []    
max = 0.0
first = True 
Done = False
finished = []
while not Done:
    for item in flow:
        if len(finished) == 4:
            Done = True
            break
        if len(item[1]) == 0:
            if item[0] not in finished:
                finished.append(item[0])
            continue
        if first == True:
            max = item[1][-1][0]
            content.append((item[0], item[1].pop()))
            first = False 
            continue
        if item[1][-1][0] > max:
            max = item[1][-1][0]
            content.append((item[0], item[1].pop()))
            content = sorted(content, key=itemgetter(1))    

    first = True    

更新: 谢谢大家

4 个答案:

答案 0 :(得分:5)

>>> from operator import itemgetter
>>> import pprint
>>> pprint.pprint(sorted(((i,k) for i,j in INPUT for k in j), key=itemgetter(1)))
[('A', (0.12, 'how')),
 ('A', (0.26000000000000001, 'are')),
 ('A', (0.69999999999999996, 'you')),
 ('A', (0.90000000000000002, 'mike')),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6000000000000001, 'and you')),
 ('A', (1.8999999999999999, "I'm fine")),
 ('C', (2.1200000000000001, 'good')),
 ('C', (2.2400000000000002, 'morning')),
 ('C', (3.1299999999999999, 'guys'))]

这里有两件主要的事情

[(i,k) for i,j in INPUT for k in j]

将INPUT转换为此结构

[('A', (0.12, 'how')),
 ('A', (0.26, 'are')),
 ('A', (0.7, 'you')),
 ('A', (0.9, 'mike')),
 ('A', (1.9, "I'm fine")),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6, 'and you')),
 ('C', (2.12, 'good')),
 ('C', (2.24, 'morning')),
 ('C', (3.13, 'guys'))]

sorted(L, key=itemgetter(1))

分类L购买每个元素的项目[1]。这实际上是(0.12,'how'),(0.27,'是')...但是python对元组进行排序的正常方式是从左到右,所以我们不需要做额外的工作来从中删除单词元组

答案 1 :(得分:2)

(好的,样本数据使问题描述更清晰。答案相应修改)

第1步:通过对当前解决方案进行逆向工程来阐明您的问题描述。

  1. 有4种不同的数据集,分别为A,B,C和D
  2. 这些数据集包含在一系列形式的2元组中(ListID,元素)
  3. 每个“元素”条目本身就是表单(索引,值)的2元组列表
  4. 空元素条目表示数据集的结尾
  5. 目标是将这些数据集合并为单个排序的2元组列表(ListID,(索引,值))
  6. 第2步:转换输入数据以创建所需表单的单个记录。

    生成器是为这种事物而构建的,因此定义一个是有意义的。

    def get_data(flow, num_data_sets=4):
        finished = set()
        for list_id, elements in flow:
            if list_id in finished:
                continue
            if not elements:
                finished.add(list_id)
                if len(finished) == num_data_sets:
                    break
                continue
            for element in elements:
                yield list_id, element
    

    第3步:使用sorted生成所需的有序列表

    content = sorted(get_data(flow))
    

    样本用法:

    # get_data defined via copy/paste of source code above
    # ref_data taken from the revised question
    >>> demo_data = [
    ...   ('A', [(1, 2), (3, 4)]),
    ...   ('B', [(7, 8), (9, 10)]),
    ...   ('A', [(0, 0)]),
    ...   ('C', []), # Finish early
    ...   ('C', [('ignored', 'entry')])
    ... ]
    >>> content = sorted(get_data(demo_data))
    >>> print '\n'.join(map(str, content))
    ('A', 0, 0)
    ('A', 1, 2)
    ('A', 3, 4)
    ('B', 7, 8)
    ('B', 9, 10)
    >>> content = sorted(get_data(ref_data), key=itemgetter(1))
    >>> print '\n'.join(map(str, content))
    ('A', 0.12, 'how')
    ('A', 0.26, 'are')
    ('A', 0.7, 'you')
    ('A', 0.9, 'mike')
    ('B', 1.23, 'fine')
    ('B', 1.5, 'thanks')
    ('B', 1.6, 'and you')
    ('A', 1.9, "I'm fine too")
    ('C', 2.12, 'good')
    ('C', 2.24, 'morning')
    ('C', 3.13, 'guys')
    

    由于两个主要原因,您的解决方案最终会变得混乱且难以阅读:

    1. 未使用生成器意味着您无法获得内置排序函数的全部好处
    2. 通过使用索引而不是元组解包,你很难跟踪什么是

答案 2 :(得分:2)

data = [(x,id) for (id, xs) in data for x in xs]
data.sort()
for xs,id in data:
    print id,xs


A (0.12, 'how')
A (0.26000000000000001, 'are')
A (0.69999999999999996, 'you')
A (0.90000000000000002, 'mike')
B (1.23, 'fine')
B (1.5, 'thanks')
B (1.6000000000000001, 'and you')
A (1.8999999999999999, "I'm fine too")
C (2.1200000000000001, 'good')
C (2.2400000000000002, 'morning')
C (3.1299999999999999, 'guys')

答案 3 :(得分:2)

您的意见:

l = [('A',
    [(0.12, 'how'),
    (0.26000000000000001, 'are'),
    (0.69999999999999996, 'you'),
    (0.90000000000000002, 'mike'),
    (1.8999999999999999, "I'm fine too")]),
    ('B', [(1.23, 'fine'), (1.5, 'thanks'), (1.6000000000000001, 'and you')]),
    ('C',
    [(2.1200000000000001, 'good'),
    (2.2400000000000002, 'morning'),
    (3.1299999999999999, 'guys')])]

转换(和打印):

newlist = []
for alpha, tuplelist in l:
    for tup in tuplelist:
        newlist.append((alpha,tup))

from operator import itemgetter
sorted(newlist,key=itemgetter(1))
print newlist

检查!

[('A', (0.12, 'how')),
 ('A', (0.26000000000000001, 'are')),
 ('A', (0.69999999999999996, 'you')),
 ('A', (0.90000000000000002, 'mike')),
 ('B', (1.23, 'fine')),
 ('B', (1.5, 'thanks')),
 ('B', (1.6000000000000001, 'and you')),
 ('A', (1.8999999999999999, "I'm fine too")),
 ('C', (2.1200000000000001, 'good')),
 ('C', (2.2400000000000002, 'morning')),
 ('C', (3.1299999999999999, 'guys'))]

您当然可以在列表解析中执行此操作,但您仍然使用2个for循环和1个内置sorted函数。不妨让它变得冗长可读。