Question

假设我有一组带有人名的元组。我想找到所有姓名相同的人，排除不与其他人分享姓氏的人

# input
names = set([('John', 'Lee'), ('Mary', 'Miller'), ('Paul', 'Ryan'), 
             ('Bob', 'Ryan'), ('Tina', 'Lee'), ('Bob', 'Smith')])

# expected output
{'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']} # or similar

这就是我正在使用的

def find_family(names):
    result = {}

    try:
        while True:
            name = names.pop()
            if name[1] in result:
                result[name[1]].append(name[0])
            else:
                result[name[1]] = [name[0]]
    except KeyError:
        pass

    return dict(filter(lambda x: len(x[1]) > 1, result.items()))

这看起来很丑陋而效率低下。还有更好的方法吗？

Answer 1

defaultdict可用于简化代码：

from collections import defaultdict

def find_family(names):
    d = defaultdict(list)
    for fn, ln in names:
        d[ln].append(fn)
    return dict((k,v) for (k,v) in d.items() if len(v)>1)

names = set([('John', 'Lee'), ('Mary', 'Miller'), ('Paul', 'Ryan'), 
             ('Bob', 'Ryan'), ('Tina', 'Lee'), ('Bob', 'Smith')])
print find_family(names)

打印：

{'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']}

Answer 2

不使用while循环，而是在设置内容上使用for循环（或类似构造）（当你在它时，你可以对元组进行解构）：

for firstname, surname in names:
    # do your stuff

您可能希望使用defaultdict或OrderedDict（http://docs.python.org/library/collections.html）将数据保存在循环体中。

Answer 3

>>> names = set([('John', 'Lee'), ('Mary', 'Miller'), ('Paul', 'Ryan'), 
...              ('Bob', 'Ryan'), ('Tina', 'Lee'), ('Bob', 'Smith')])

你可以通过for循环轻松获得所有人的词典，其中的键是他们的姓氏：

>>> families = {}
>>> for name, lastname in names:
...   families[lastname] = families.get(lastname, []) + [name]
... 
>>> families
{'Miller': ['Mary'], 'Smith': ['Bob'], 'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']}

然后，您只需要使用条件len(names) > 1过滤字典。这种过滤可以使用“字典理解”来完成：

>>> filtered_families = {lastname: names for lastname, names in families.items() if len(names) > 1}
>>> filtered_families
{'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']}

使用共同元素查找元组

3 个答案: