Question

我正在处理一个接收imdb文本文件的程序，并根据用户输入N输出顶级actor（通过电影外观）。

但是，我遇到了一个问题，即我需要在相同数量的电影中播放演员，这是我需要避免的。相反，如果两个演员在5部电影中，例如，应该出现数字5，演员姓名应该合并，用分号分隔。

我已经尝试了多种解决方法，但还没有任何工作。有什么建议吗？

if __name__ == "__main__":
    imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
    print imdb_file
    N= input('Enter the number of top individuals ==> ')
    print N


    actors_to_movies = {}

    for line in open(imdb_file):
        words = line.strip().split('|')
        actor = words[0].strip()
        movie = words[1].strip()
        if not actor in actors_to_movies:
            actors_to_movies[actor] = set()
        actors_to_movies[actor].add(movie)

    movie_list= sorted(list(actors_to_movies[actor])) 

    #Arranges Dictionary into List of Tuples#
    D = [ (x, actors_to_movies[x]) for x in actors_to_movies]
    descending = sorted(D, key = lambda x: len(x[1]), reverse=True)

    #Prints Tuples in Descending Order N number of times (User Input)#
    for i in range(N):
        print str(len(descending[i][1]))+':', descending[i][0]

Answer 1

There is a useful method itertools.groupby

It allows you to split list into the groups by some key. Using it you can easily write a function that prints top actors:

import itertools
def print_top_actors(actor_info_list, top=5):
    """
    :param: actor_info_list should contain tuples of (actor_name, movie_count)
    """
    actor_info_list.sort(key=lambda x: x[1], reverse=True)
    for i, (movie_count, actor_iter) in enumerate(itertools.groupby(actor_info_list)):
        if i >= top:
            break
        print movie_count, ';'.join(actor for actor, movie_count in actor_iter)

and example of usage:

>>> print_top_actors(
...     [
...         ("DiCaprio", 100500),
...         ("Pitt", 100500),
...         ("foo", 10),
...         ("bar", 10),
...         ("baz", 10),
...         ("qux", 3),
...         ("lol", 1)
...     ], top = 3)
100500 DiCaprio;Pitt
10 foo;bar;baz
3 qux

结合元组列表中的元素？

1 个答案: