我有一个像tab这样的文件文件:
20001 World Economies
20002 Bill Clinton
20004 Internet Law
20005 Philipines Elections
20006 Israel Politics
20008 Golf
20009 Music
20010 Disasters
这是一个巨大的文件,由100对这样的对组成。如何使用此文件在python中创建字典?
def get_pair(line):
key, sep, value = line.strip().partition("\t")
return int(key), value
with open("TopicMapped.txt") as fd:
d = dict(get_pair(line) for line in fd)
fd=open('dictionary.txt', 'w')
print>> fd,d
但是,将这个字典打印到文件会给我一个空文件吗?
答案 0 :(得分:3)
您可以使用以下简单代码轻松完成此操作:
fID=open('TopicMapped.txt')
myDict=dict() #init empty dictionary
for line in fID:
#read the file line-by-line (if it's huge, it might be cumbersome to import it entirely in memory, e.g. using readlines())
# and also remove newline tags
line=line.rstrip()
#create a list where the first element is the number and the second element is the text
line=line.split("\t")
#update dictionary
myDict[line[0]]=line[1]
print myDict
fID.close()
此代码返回以下字典
{'20010': 'Disasters', '20006': 'Israel Politics', '20005': 'Philipines Elections', '20004': 'Internet Law', '20002': 'Bill Clinton', '20001': 'World Economies', '20009': 'Music', '20008': 'Golf'}
如果您希望数字为整数而不是字符串,则可以执行类似
的操作myDict[int(line[0])]=line[1] #update dictionary
结果字典将是
{20001: 'World Economies', 20002: 'Bill Clinton', 20004: 'Internet Law', 20005: 'Philipines Elections', 20006: 'Israel Politics', 20008: 'Golf', 20009: 'Music', 20010: 'Disasters'}
答案 1 :(得分:3)
您自己的代码实际上有效,看起来它会为您提供一个空文件,因为您在关闭它之前测试该文件:
In [15]: fd=open('dictionary.txt', 'w')
In [16]: print >> fd, d
# looks empty
In [17]: cat dictionary.txt
# actually close the file so what is in the buffer is written to disk
In [18]: fd.close()
# now you see the data
In [19]: cat dictionary.txt
{20001: ' World Economies', 20002: ' Bill Clinton', 20004: ' Internet Law', 20005: ' Philipines Elections', 20006: ' Israel Politics', 20008: ' Golf', 20009: ' Music', 20010: ' Disasters'}
您可以使用 dict comprehension 进行操作,并使用with
打开文件,它会自动关闭它们,以避免像上面的代码那样的简单错误:
In [7]: with open("text.txt") as f:
dct = {int(k): v.rstrip() for line in f for k, v in (line.split(None, 1),)}
...:
In [8]: dct
Out[8]:
{20001: 'World Economies',
20002: 'Bill Clinton',
20004: 'Internet Law',
20005: 'Philipines Elections',
20006: 'Israel Politics',
20008: 'Golf',
20009: 'Music',
20010: 'Disasters'}
如果要存储在文件中,请使用json
模块:
In [13]: import json
In [14]: with open("text.txt") as f, open("out.json","w") as out:
json.dump({int(k): v.rstrip() for line in f for k, v in (line.split(None, 1),)}, out)
....:
In [15]: cat out.json
{"20001": "World Economies", "20002": "Bill Clinton", "20004": "Internet Law", "20005": "Philipines Elections", "20006": "Israel Politics", "20008": "Golf", "20009": "Music", "20010": "Disasters"}
json总是将整数解析为字符串,所以如果你真的想要整数,你可以pickle
你的字典:
In [8]: import pickle
In [9]: with open("text.txt") as f, open("out.pkl","wb") as out:
pickle.dump({int(k): v.rstrip() for line in f for k, v in (line.split(None, 1),)}, out)
...:
In [10]: with open("out.pkl","rb") as in_fle:
dct = pickle.load(in_fle)
....:
In [11]: dct
Out[11]:
{20001: 'World Economies',
20002: 'Bill Clinton',
20004: 'Internet Law',
20005: 'Philipines Elections',
20006: 'Israel Politics',
20008: 'Golf',
20009: 'Music',
20010: 'Disasters'}
您也可以使用csv
lib进行解析:
import csv
with open("text.txt") as f:
dct = {int(k): v for k,v in csv.reader(f, delimiter="\t")}