我想从第1列中删除重复项,并在colum 2中使用python返回与每个唯一项关联的相关值列表。
输入
1 2
Jack London 'Son of the Wolf'
Jack London 'Chris Farrington'
Jack London 'The God of His Fathers'
Jack London 'Children of the Frost'
William Shakespeare 'Venus and Adonis'
William Shakespeare 'The Rape of Lucrece'
Oscar Wilde 'Ravenna'
Oscar Wilde 'Poems'
而输出应为
1 2
Jack London 'Son of the Wolf, Chris Farrington, Able Seaman, The God of His Fathers,Children of the Frost'
William Shakespeare 'The Rape of Lucrece,Venus and Adonis'
Oscar Wilde 'Ravenna,Poems'
其中第二列包含与每个项目关联的值的总和。 我在字典
上尝试了set()函数dic={'Jack London': 'Son of the Wolf', 'Jack London': 'Chris Farrington', 'Jack London': 'The God of His Fathers'}
set(dic)
但它只返回字典的第一个键
set(['Jack London'])
答案 0 :(得分:2)
在Python中,字典每个键只能包含一个值。但该值可以是项目的集合:
>>> d = {'Jack London': ['Son of the Wolf', 'Chris Farrington']}
>>> d['Jack London']
['Son of the Wolf', 'Chris Farrington']
要从一系列键值对构造这样的字典,您可以执行以下操作:
dct = {}
for author, title in items:
if author not in dct:
# Create a new entry for the author
dct[author] = [title]
else:
# Add another item to the existing entry
dct[author].append(title)
循环体可以更加简洁:
dct = {}
for author, title in items:
dct.setdefault(author, []).append(title)
答案 1 :(得分:2)
您应该使用itertools.groupby
,因为您的列表已排序。
rows = [('1', '2'),
('Jack London', 'Son of the Wolf'),
('Jack London', 'Chris Farrington'),
('Jack London', 'The God of His Fathers'),
('Jack London', 'Children of the Frost'),
('William Shakespeare', 'Venus and Adonis'),
('William Shakespeare', 'The Rape of Lucrece'),
('Oscar Wilde', 'Ravenna'),
('Oscar Wilde', 'Poems')]
# I'm not sure how you get here, but that's where you get
from itertools import groupby
from operator import itemgetter
grouped = groupby(rows, itemgetter(0))
result = {group:', '.join([value[1] for value in values]) for group, values in grouped}
这会给你一个结果:
In [1]: pprint(result)
{'1': '2',
'Jack London': 'Son of the Wolf, Chris Farrington, The God of His Fathers, '
'Children of the Frost',
'Oscar Wilde': 'Ravenna, Poems',
'William Shakespeare': 'Venus and Adonis, The Rape of Lucrece'}