限制python字典中的值

时间:2014-03-25 08:23:36

标签: python dictionary

我有一个脚本,它接受包含演员和电影名称的文件,并构建每部电影中演员的哈希值。下面是我当前的代码,我想将字典的大小限制为10即。只有每部电影中的前10个演员才能加入到字典中。我尝试了循环和休息,我的方法不起作用。

更新,我尝试合并@ jonrsharpe的建议,但它只是简单地切换我的字典而不将每部电影的演员列表限制为10:

movietoactorfile = open('mov2act.pickle', 'w')
movietoactor = {}

for line in gzip.open(moviefile_name, 'rb').readlines():
  (actor, movie, rank) = line.rstrip('\r\n\s').split('\t')
  if movie not in movietoactor:
    movietoactor[movie] = []
  movietoactor[movie].append(actor)

for movie in movietoactor:
  s = "\t".join(movietoactor[movie][:10])

pickle.dump(movietoactor, movietoactorfile)

Oringinal样本输出:

S'Irma la Douce (1963)'
p1
S"\tDeauville, Sheryl\tEarl, Jane\tEarl, Ruth\tHoliday, Hope\tMacLaine, Shirley\tSatana, Tura\tShawlee, Joan\tWhitney, Grace Lee\tWoods, Susan (I)\tYoung, Harriette\tAlvin, John (I)\tBarrier, Edgar\tBeck, Billy (I)\tBernardi, Herschel\tBixby, Bill\tBrown, James (II)\tCaan, James\tDiamond, Don\tDubov, Paul\tJacobi, Lou\tJourdan, Louis (I)\tKrugman, Lou\tLemmon, Jack (I)\tLerner, Diki\tMcNear, Howard\tMoustache\tO'Dell, Doye\tOsmond, Cliff\tPalma, Joe\tPeel, Richard\tYarnell, Bruce"
p2
sS'American Buffalo (1996)'
p3
S'\tFranz, Dennis (I)\tHoffman, Dustin\tNelson, Sean (I)'

以上代码的当前输出:

S'Irma la Douce (1963)'
p1
(lp2
S'Deauville, Sheryl'
p3
aS'Earl, Jane'
p4
aS'Earl, Ruth'
p5
aS'Holiday, Hope'
p6
aS'MacLaine, Shirley'
p7
aS'Satana, Tura'
p8
aS'Shawlee, Joan'
p9
aS'Whitney, Grace Lee'
p10
aS'Woods, Susan (I)'
p11
aS'Young, Harriette'
p12
aS'Alvin, John (I)'
p13
aS'Barrier, Edgar'
p14
aS'Beck, Billy (I)'
p15
aS'Bernardi, Herschel'
p16
aS'Bixby, Bill'
p17
aS'Brown, James (II)'
p18
aS'Caan, James'
p19
aS'Diamond, Don'
p20
aS'Dubov, Paul'
p21
aS'Jacobi, Lou'
p22
aS'Jourdan, Louis (I)'
p23
aS'Krugman, Lou'
p24
aS'Lemmon, Jack (I)'
p25
aS'Lerner, Diki'
p26
aS'McNear, Howard'
p27
aS'Moustache'
p28
aS"O'Dell, Doye"
p29
aS'Osmond, Cliff'
p30
aS'Palma, Joe'
p31
aS'Peel, Richard'
p32
aS'Yarnell, Bruce'
p33
asS'American Buffalo (1996)'
p34
(lp35
S'Franz, Dennis (I)'
p36
aS'Hoffman, Dustin'

所需的输出应该是这样的:

S'Irma la Douce (1963)'
p1
S"\tDeauville, Sheryl\tEarl, Jane\tEarl, Ruth\tHoliday, Hope\tMacLaine, Shirley\tSatana, Tura\tShawlee, Joan\tWhitney, Grace Lee\tWoods, Susan (I)\tYoung, Harriette"
p2
sS'American Buffalo (1996)'
p3
S'\tFranz, Dennis (I)\tHoffman, Dustin\tNelson, Sean (I)'

关于将i = 1置于行之外的建议,这是我在尝试之前尝试过的第一个编辑,但是这不起作用:

movietoactorfile = open('mov2act.pickle', 'w')
movietoactor = {}

i = 1
for line in gzip.open(moviefile_name, 'rb').readlines():
  (actor, movie, rank) = line.rstrip('\r\n\s').split('\t')
  if movie not in movietoactor:
    movietoactor[movie] = ''
  movietoactor[movie] += '\t%s' % actor
  i += 1
  if i > 10:
    break

pickle.dump(movietoactor, movietoactorfile)

输出:

S'\tactor'
p6
sS'Queen of the Damned (2002)'
p7
S'\tAaliyah'
p8
sS'Kauas pilvet karkaavat (1996)'
p9
S'\tAaltonen, Minna'
p10
sS'Class Act (1992)'
p11
S'\tAalda, Mariann'
p12
sS'Twenty Bucks (1993)'
p13
S'\tAabel, Per (II)'
p14
sS'South Pacific (1958)'
p15
S'\tAadland, Beverly'
p16
sS'Tomorrow Never Dies (1997)'
p17
S'\tAaltonen, Minna'
p18
sS'Romeo Must Die (2000)'
p19
S'\tAaliyah'
p20
s.

1 个答案:

答案 0 :(得分:1)

您每次通过i循环将for重置为1;最小的解决方法是将其移到外面:

i = 1
for line in gzip.open(moviefile_name, 'rb').readlines():

修改:这不起作用,因为文件中有多部电影。你可以为每部电影分别计算,但你也可以跳到第2部分:


如果您为每个movie使用了一个列表,那么这将更容易:

if movie not in movietoactor:
    movietoactor[movie] = []
if len(movietoactor[movie]) < 10:
    movietoactor[movie].append(actor)

如果您的其他脚本也在Python中,则无需尝试解析 pickle文件,只需使用pickle.load返回实际的数据结构即可。您可以将数据保留在列表中,它们将在其他脚本中恢复,而不是需要例如split('\t')。这使得操作变得更加容易,并且是使用pickle.

的重点

如果不是Python,可能会有更简单的格式用于传输 - 例如csv,其中每行以电影名称开头,然后最多有十个演员:

'American Buffalo (1996)','Franz, Dennis (I)','Hoffman, Dustin','Nelson, Sean (I)'

或者,请查看json

如果您真的,想要坚持使用的内容,您可以在pickle之前将每个列表转换为字符串:

for movie in movietoactor:
    movietoactor[movie] = "\t".join(movietoactor[movie])