我有一个包含以下文字的文件
1. Beatles - Revolver (1966)
2. Nirvana - Nevermind (1991)
3. Beatles - Sgt Pepper's Lonely Hearts Club Band (1967)
4. U2 - The Joshua Tree (1987)
5. Beatles - The Beatles (The White Album) (1968)
6. Beatles - Abbey Road (1969)
7. Guns N' Roses - Appetite For Destruction (1987)
8. Radiohead - Ok Computer (1997)
9. Led Zeppelin - Led Zeppelin 4 (1971)
10. U2 - Achtung Baby (1991)
11. Pink Floyd - Dark Side Of The Moon (1973)
12. Michael Jackson -Thriller (1982)
13. Rolling Stones - Exile On Main Street (1972)
14. Clash - London Calling (1979)
15. U2 - All That You Can't Leave Behind (2000)
16. Weezer - Pinkerton (1996)
17. Radiohead - The Bends (1995)
18. Smashing Pumpkins - Mellon Collie And The Infinite Sadness (1995)
19. Pearl Jam - Ten (1991)
20. Beach Boys - Pet Sounds (1966)
21. Weezer - Weezer (1994)
22. Nirvana - In Utero (1993)
23. Beatles - Rubber Soul (1965)
24. Eminem -The Eminem Show (2002)
25. R.E.M. - Automatic For The People (1992)
26. Radiohead - Kid A (2000)
27. Tool - Aenima (1996)
28. Smashing Pumpkins - Siamese Dream (1993)
29. Madonna - Ray Of Light (1998)
30. Rolling Stones - Sticky Fingers (1971)
...till line 99.
因此我必须将信息存储到其键是Bandname的字典中,并且关联的值是包含该Band的所有最佳专辑的列表。此列表的每个条目都是一个由两个字段组成的元组:相册名称和发布它的年份。我还要摆脱标点符号和括号。有人可以帮帮忙吗?
答案 0 :(得分:2)
首先尝试这个。这绝不是完美的,您需要从这里拿出它并根据您的需要进行调整。
import re
my_dict = {}
for record in songs:
year = re.findall('\(([0-9]{4})\)', record)
band = re.findall('[0-9]+\. (.*)', l.split('-')[0])
song = re.findall('(.*) \(', record.split('-')[1].strip())
if song and band and year:
if my_dict.has_key(band): #alread present, append
my_dict[band].append((song, year))
else: #create new entry
my_dict[band] = [(song, year)]
print my_dict
答案 1 :(得分:1)
我要做的是从文件中读取每一行,将其解析为字符串,在每个.
拆分字符串,然后将第一个字符串作为键,第二个字符串作为值。 E.X:
albumDict = {}
file = open(/path/to/file, "r")
for line in file.readlines():
splitLine = line.split(".")
albumDict[splitLine[0]] = splitline[1]
编辑: 注意:这不会检查重复的条目,也不应该在专业设置中实现。如果您想让多人使用它,请添加一项检查以确保该密钥尚不存在。
答案 2 :(得分:1)
这是一个可能更适合您的解决方案:
import re
from collections import defaultdict
band_dict = defaultdict(list)
pattern = re.compile(r"\d+\. (?P<band>.+?) -\s?(?P<album>.+?) \((?P<year>\d+)\)")
with open("musiclist") as f:
for line in f:
match = pattern.match(line)
if match:
groupdict = match.groupdict()
band_dict[groupdict['band']].append((groupdict['album'], groupdict['year']))
else:
print "Error, no match for line %s" % line
for band in band_dict:
print band
for album, year in band_dict[band]:
print "\t%s: %s" % (album, year)
使用您提供的数据musiclist
运行此代码
Pink Floyd
Dark Side Of The Moon: 1973
Beatles
Revolver: 1966
Sgt Pepper's Lonely Hearts Club Band: 1967
The Beatles (The White Album): 1968
Abbey Road: 1969
Rubber Soul: 1965
Clash
London Calling: 1979
Rolling Stones
Exile On Main Street: 1972
Sticky Fingers: 1971
Led Zeppelin
Led Zeppelin 4: 1971
R.E.M.
Automatic For The People: 1992
Guns N' Roses
Appetite For Destruction: 1987
U2
The Joshua Tree: 1987
Achtung Baby: 1991
All That You Can't Leave Behind: 2000
Nirvana
Nevermind: 1991
In Utero: 1993
Pearl Jam
Ten: 1991
Tool
Aenima: 1996
Beach Boys
Pet Sounds: 1966
Madonna
Ray Of Light: 1998
Radiohead
Ok Computer: 1997
The Bends: 1995
Kid A: 2000
Eminem
The Eminem Show: 2002
Weezer
Pinkerton: 1996
Weezer: 1994
Smashing Pumpkins
Mellon Collie And The Infinite Sadness: 1995
Siamese Dream: 1993
Michael Jackson
Thriller: 1982