Question

我有几个元组看起来像这样。我想把同一个句子中的所有单词组合起来。

('1.txt','sentence 1.1','city')
('1.txt','sentence 1.1','apple')
('1.txt','sentence 1.1','ok')
('1.txt','sentence 1.2','go')
('1.txt','sentence 1.2','home')
('1.txt','sentence 1.2','city')
('2.txt','sentence 2.1','sign')
('2.txt','sentence 2.1','tree')
('2.txt','sentence 2.1','cat')
('2.txt','sentence 2.2','good')
('2.txt','sentence 2.2','image')

如何根据句子组合单词例如：

('1.txt','sentence 1.1','city apple ok')
('1.txt','sentence 1.2','go home city')
('2.txt','sentence 2.1','sign tree cat')
('2.txt','sentence 2.2','good image')

或者以这种方式作为列表或字典

['1.txt','sentence 1.1',['city','apple','ok']]
['1.txt','sentence 1.2',['go','home','city']]
['2.txt','sentence 2.1',['sign', 'tree', 'cat']]
['2.txt','sentence 2.2',['good', 'image']]

如果我想转换为字典，该怎么做？

Answer 1

根据您的输入数据，似乎这些单词是针对元组的第一个和第二个项目（索引0和1）的组合键入的。

您可以构建一个将此项目组合映射到单词的字典，并进行一些后处理以将数据重新格式化为您想要的结构。

这是一个程序性的O（n）方法。

import collections

sentences = collections.defaultdict(list)
for file_name, sentence_id, word in input_data:
    sentences[(file_name, sentence_id)].append(word)

# sentences is now formatted like {('1.txt', 'sentence 1.1'): ['city', 'apple', 'go']}

for key, val in sentences.items():
    print list(key) + [val]
    # ['1.txt', 'sentence 1.1', ['city', 'apple', 'go']]

Answer 2

您还可以使用groupby将每个元组的前两个元素作为键，假设您的元组列表已经由前两个元素排序：

from itertools import groupby
[[k[0], k[1], [i[2] for i in g]] for k, g in groupby(lst, key = lambda x: x[:2])]

#[['1.txt', 'sentence 1.1', ['city', 'apple', 'ok']],
# ['1.txt', 'sentence 1.2', ['go', 'home', 'city']],
# ['2.txt', 'sentence 2.1', ['sign', 'tree', 'cat']],
# ['2.txt', 'sentence 2.2', ['good', 'image']]]

Answer 3

你可以试试这个

l=[]
l.append(('1.txt','sentence 1.1','city'))
l.append(('1.txt','sentence 1.1','apple'))
l.append( ('1.txt','sentence 1.1','ok') )
l.append( ('1.txt','sentence 1.2','go') )
l.append( ('1.txt','sentence 1.2','home') )
l.append( ('1.txt','sentence 1.2','city') )
l.append( ('2.txt','sentence 2.1','sign') )
l.append( ('2.txt','sentence 2.1','tree') )
l.append( ('2.txt','sentence 2.1','cat') )
l.append( ('2.txt','sentence 2.2','good') )
l.append( ('2.txt','sentence 2.2','image') )

d={}
for i in l:
    myKey=i[0]+" "+i[1]
    if myKey in d:
        d[myKey].append(i[2])
    else:
        d[myKey]=[]

ans=[]
for k in d:
    v=k.split(" ")
    ans.append([v[0],''.join(v[1]+" "+v[2]),d[k]])

print sorted(ans)

如何组合元组中的元素或相应地列出python

3 个答案: