我做了一个解析器,从过去的试卷中提取问题,并列出了多年来特定问题/主题出现的频率。它将问题/主题存储为字典,将日期存储为列表,并应将两者组合如下:
questions = {'Question1':['April 2011', 'May 2016'], 'Question2': ['June 2013']}
问题是,我无法更新字典中的日期列表。我的代码片段如下:
def extract_topics_dates(file):
corpus = ''
topics = []
questions = {}
year = []
pdf_reader = PyPDF2.PdfFileReader(open(file, 'rb'))
for page in pdf_reader.pages:
#For each page, get corpus of text.
for line in page.extractText().splitlines():
corpus = corpus + line
#For each page, extract topics.
for i in [phrase for phrase in map(str.strip, re.split('\d+\s\s', corpus)) if phrase]:
topics.append(extract_topic(i))
topics = [x for x in topics if x is not None]
#For each page, extract date.
year = set([x for x in year if x is not None])
year.add(get_date(page))
#For each page, now combine the topic + date.
for i in topics:
questions[i].add(year)
return questions
此函数中的所有内容均按预期工作,但最后一个questions[i].add(year)
返回KeyError除外。我要去哪里错了?
答案 0 :(得分:1)
您应该在字典中添加键之前为其创建一个列表。请将for loop
更改为以下内容:
for i in topics:
if i not in questions:
questions[i] = list()
questions[i].append(year)
或者,按照@Jon Clements的建议:
for topic in topics:
questions.setdefault(topic, []).append(year)