如何组合元组中的元素或相应地列出python

时间:2017-01-06 02:53:37

标签: python list tuples

我有几个元组看起来像这样。我想把同一个句子中的所有单词组合起来。

('1.txt','sentence 1.1','city')
('1.txt','sentence 1.1','apple')
('1.txt','sentence 1.1','ok')
('1.txt','sentence 1.2','go')
('1.txt','sentence 1.2','home')
('1.txt','sentence 1.2','city')
('2.txt','sentence 2.1','sign')
('2.txt','sentence 2.1','tree')
('2.txt','sentence 2.1','cat')
('2.txt','sentence 2.2','good')
('2.txt','sentence 2.2','image')

如何根据句子组合单词例如:

('1.txt','sentence 1.1','city apple ok')
('1.txt','sentence 1.2','go home city')
('2.txt','sentence 2.1','sign tree cat')
('2.txt','sentence 2.2','good image')

或者以这种方式作为列表或字典

['1.txt','sentence 1.1',['city','apple','ok']]
['1.txt','sentence 1.2',['go','home','city']]
['2.txt','sentence 2.1',['sign', 'tree', 'cat']]
['2.txt','sentence 2.2',['good', 'image']]

如果我想转换为字典,该怎么做?

3 个答案:

答案 0 :(得分:2)

根据您的输入数据,似乎这些单词是针对元组的第一个和第二个项目(索引0和1)的组合键入的。

您可以构建一个将此项目组合映射到单词的字典,并进行一些后处理以将数据重新格式化为您想要的结构。

这是一个程序性的O(n)方法。

import collections

sentences = collections.defaultdict(list)
for file_name, sentence_id, word in input_data:
    sentences[(file_name, sentence_id)].append(word)

# sentences is now formatted like {('1.txt', 'sentence 1.1'): ['city', 'apple', 'go']}

for key, val in sentences.items():
    print list(key) + [val]
    # ['1.txt', 'sentence 1.1', ['city', 'apple', 'go']]

答案 1 :(得分:2)

您还可以使用groupby将每个元组的前两个元素作为键,假设您的元组列表已经由前两个元素排序:

from itertools import groupby
[[k[0], k[1], [i[2] for i in g]] for k, g in groupby(lst, key = lambda x: x[:2])]

#[['1.txt', 'sentence 1.1', ['city', 'apple', 'ok']],
# ['1.txt', 'sentence 1.2', ['go', 'home', 'city']],
# ['2.txt', 'sentence 2.1', ['sign', 'tree', 'cat']],
# ['2.txt', 'sentence 2.2', ['good', 'image']]]

答案 2 :(得分:0)

你可以试试这个

l=[]
l.append(('1.txt','sentence 1.1','city'))
l.append(('1.txt','sentence 1.1','apple'))
l.append( ('1.txt','sentence 1.1','ok') )
l.append( ('1.txt','sentence 1.2','go') )
l.append( ('1.txt','sentence 1.2','home') )
l.append( ('1.txt','sentence 1.2','city') )
l.append( ('2.txt','sentence 2.1','sign') )
l.append( ('2.txt','sentence 2.1','tree') )
l.append( ('2.txt','sentence 2.1','cat') )
l.append( ('2.txt','sentence 2.2','good') )
l.append( ('2.txt','sentence 2.2','image') )

d={}
for i in l:
    myKey=i[0]+" "+i[1]
    if myKey in d:
        d[myKey].append(i[2])
    else:
        d[myKey]=[]

ans=[]
for k in d:
    v=k.split(" ")
    ans.append([v[0],''.join(v[1]+" "+v[2]),d[k]])

print sorted(ans)