我正在处理一个接收imdb文本文件的程序,并根据用户输入N输出顶级actor(通过电影外观)。
但是,我遇到了一个问题,即我需要在相同数量的电影中播放演员,这是我需要避免的。相反,如果两个演员在5部电影中,例如,应该出现数字5,演员姓名应该合并,用分号分隔。
我已经尝试了多种解决方法,但还没有任何工作。有什么建议吗?
if __name__ == "__main__":
imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
print imdb_file
N= input('Enter the number of top individuals ==> ')
print N
actors_to_movies = {}
for line in open(imdb_file):
words = line.strip().split('|')
actor = words[0].strip()
movie = words[1].strip()
if not actor in actors_to_movies:
actors_to_movies[actor] = set()
actors_to_movies[actor].add(movie)
movie_list= sorted(list(actors_to_movies[actor]))
#Arranges Dictionary into List of Tuples#
D = [ (x, actors_to_movies[x]) for x in actors_to_movies]
descending = sorted(D, key = lambda x: len(x[1]), reverse=True)
#Prints Tuples in Descending Order N number of times (User Input)#
for i in range(N):
print str(len(descending[i][1]))+':', descending[i][0]
答案 0 :(得分:3)
There is a useful method itertools.groupby
It allows you to split list into the groups by some key. Using it you can easily write a function that prints top actors:
import itertools
def print_top_actors(actor_info_list, top=5):
"""
:param: actor_info_list should contain tuples of (actor_name, movie_count)
"""
actor_info_list.sort(key=lambda x: x[1], reverse=True)
for i, (movie_count, actor_iter) in enumerate(itertools.groupby(actor_info_list)):
if i >= top:
break
print movie_count, ';'.join(actor for actor, movie_count in actor_iter)
and example of usage:
>>> print_top_actors(
... [
... ("DiCaprio", 100500),
... ("Pitt", 100500),
... ("foo", 10),
... ("bar", 10),
... ("baz", 10),
... ("qux", 3),
... ("lol", 1)
... ], top = 3)
100500 DiCaprio;Pitt
10 foo;bar;baz
3 qux