我有几个元组看起来像这样。我想把同一个句子中的所有单词组合起来。
('1.txt','sentence 1.1','city')
('1.txt','sentence 1.1','apple')
('1.txt','sentence 1.1','ok')
('1.txt','sentence 1.2','go')
('1.txt','sentence 1.2','home')
('1.txt','sentence 1.2','city')
('2.txt','sentence 2.1','sign')
('2.txt','sentence 2.1','tree')
('2.txt','sentence 2.1','cat')
('2.txt','sentence 2.2','good')
('2.txt','sentence 2.2','image')
如何根据句子组合单词例如:
('1.txt','sentence 1.1','city apple ok')
('1.txt','sentence 1.2','go home city')
('2.txt','sentence 2.1','sign tree cat')
('2.txt','sentence 2.2','good image')
或者以这种方式作为列表或字典
['1.txt','sentence 1.1',['city','apple','ok']]
['1.txt','sentence 1.2',['go','home','city']]
['2.txt','sentence 2.1',['sign', 'tree', 'cat']]
['2.txt','sentence 2.2',['good', 'image']]
如果我想转换为字典,该怎么做?
答案 0 :(得分:2)
根据您的输入数据,似乎这些单词是针对元组的第一个和第二个项目(索引0和1)的组合键入的。
您可以构建一个将此项目组合映射到单词的字典,并进行一些后处理以将数据重新格式化为您想要的结构。
这是一个程序性的O(n)方法。
import collections
sentences = collections.defaultdict(list)
for file_name, sentence_id, word in input_data:
sentences[(file_name, sentence_id)].append(word)
# sentences is now formatted like {('1.txt', 'sentence 1.1'): ['city', 'apple', 'go']}
for key, val in sentences.items():
print list(key) + [val]
# ['1.txt', 'sentence 1.1', ['city', 'apple', 'go']]
答案 1 :(得分:2)
您还可以使用groupby
将每个元组的前两个元素作为键,假设您的元组列表已经由前两个元素排序:
from itertools import groupby
[[k[0], k[1], [i[2] for i in g]] for k, g in groupby(lst, key = lambda x: x[:2])]
#[['1.txt', 'sentence 1.1', ['city', 'apple', 'ok']],
# ['1.txt', 'sentence 1.2', ['go', 'home', 'city']],
# ['2.txt', 'sentence 2.1', ['sign', 'tree', 'cat']],
# ['2.txt', 'sentence 2.2', ['good', 'image']]]
答案 2 :(得分:0)
你可以试试这个
l=[]
l.append(('1.txt','sentence 1.1','city'))
l.append(('1.txt','sentence 1.1','apple'))
l.append( ('1.txt','sentence 1.1','ok') )
l.append( ('1.txt','sentence 1.2','go') )
l.append( ('1.txt','sentence 1.2','home') )
l.append( ('1.txt','sentence 1.2','city') )
l.append( ('2.txt','sentence 2.1','sign') )
l.append( ('2.txt','sentence 2.1','tree') )
l.append( ('2.txt','sentence 2.1','cat') )
l.append( ('2.txt','sentence 2.2','good') )
l.append( ('2.txt','sentence 2.2','image') )
d={}
for i in l:
myKey=i[0]+" "+i[1]
if myKey in d:
d[myKey].append(i[2])
else:
d[myKey]=[]
ans=[]
for k in d:
v=k.split(" ")
ans.append([v[0],''.join(v[1]+" "+v[2]),d[k]])
print sorted(ans)