我有一个文件,其中包含乐队列表以及专辑和制作年份。 我需要编写一个函数来查看这个文件,找到不同的波段名称,并计算每个波段在这个文件中出现的次数。
文件的外观如下:
Beatles - Revolver (1966)
Nirvana - Nevermind (1991)
Beatles - Sgt Pepper's Lonely Hearts Club Band (1967)
U2 - The Joshua Tree (1987)
Beatles - The Beatles (1968)
Beatles - Abbey Road (1969)
Guns N' Roses - Appetite For Destruction (1987)
Radiohead - Ok Computer (1997)
Led Zeppelin - Led Zeppelin 4 (1971)
U2 - Achtung Baby (1991)
Pink Floyd - Dark Side Of The Moon (1973)
Michael Jackson -Thriller (1982)
Rolling Stones - Exile On Main Street (1972)
Clash - London Calling (1979)
U2 - All That You Can't Leave Behind (2000)
Weezer - Pinkerton (1996)
Radiohead - The Bends (1995)
Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995)
.
.
.
输出必须按频率的降序排列,如下所示:
band1: number1
band2: number2
band3: number3
这是我到目前为止的代码:
def read_albums(filename) :
file = open("albums.txt", "r")
bands = {}
for line in file :
words = line.split()
for word in words:
if word in '-' :
del(words[words.index(word):])
string1 = ""
for i in words :
list1 = []
string1 = string1 + i + " "
list1.append(string1)
for k in list1 :
if (k in bands) :
bands[k] = bands[k] +1
else :
bands[k] = 1
for word in bands :
frequency = bands[word]
print(word + ":", len(bands))
我认为有一种更简单的方法可以做到这一点,但我不确定。另外,我不确定如何按频率对字典进行排序,是否需要将其转换为列表?
答案 0 :(得分:2)
你是对的,有一种更简单的方法,Counter
:
from collections import Counter
with open('bandfile.txt') as f:
counts = Counter(line.split('-')[0].strip() for line in f if line)
for band, count in counts.most_common():
print("{0}:{1}".format(band, count))
这究竟是做什么的:
line.split('-')[0].strip() for line in f
if line
?
这一行是以下循环的一种长形式:
temp_list = []
for line in f:
if line: # this makes sure to skip blank lines
bits = line.split('-')
temp_list.add(bits[0].strip())
counts = Counter(temp_list)
与上面的循环不同 - 它不会创建中间列表。相反,它创建了一个generator expression - 一种更有效的内存方式来逐步完成任务;它被用作Counter
的参数。
答案 1 :(得分:1)
如果您正在寻找简洁,请使用“defaultdict”和“sorted”
from collections import defaultdict
bands = defaultdict(int)
with open('tmp.txt') as f:
for line in f.xreadlines():
band = line.split(' - ')[0]
bands[band] += 1
for band, count in sorted(bands.items(), key=lambda t: t[1], reverse=True):
print '%s: %d' % (band, count)
答案 2 :(得分:0)
我的方法是使用split()
方法将文件行分解为成分标记列表。然后你可以获取波段名称(列表中的第一个标记),然后开始将名称添加到字典中以跟踪计数:
import operator
def main():
f = open("albums.txt", "rU")
band_counts = {}
#build a dictionary that adds each band as it is listed, then increments the count for re-lists
for line in f:
line_items = line.split("-") #break up the line into individual tokens
band = line_items[0]
#don't want to add newlines to the band list
if band == "\n":
continue
if band in band_counts:
band_counts[band] += 1 #band already in the counts, increment the counts
else:
band_counts[band] = 1 #if the band was not already in counts, add it with a count of 1
#create a list of sorted results
sorted_list = sorted(band_counts.iteritems(), key=operator.itemgetter(1))
for item in sorted_list:
print item[0], ":", item[1]
注意: