Question

我做了一个解析器，从过去的试卷中提取问题，并列出了多年来特定问题/主题出现的频率。它将问题/主题存储为字典，将日期存储为列表，并应将两者组合如下：

questions = {'Question1':['April 2011', 'May 2016'], 'Question2': ['June 2013']}

问题是，我无法更新字典中的日期列表。我的代码片段如下：

def extract_topics_dates(file):
corpus = ''
topics = []
questions = {}
year = []
pdf_reader = PyPDF2.PdfFileReader(open(file, 'rb'))
for page in pdf_reader.pages:
    #For each page, get corpus of text.
    for line in page.extractText().splitlines():
        corpus = corpus + line

    #For each page, extract topics.
    for i in [phrase for phrase in map(str.strip, re.split('\d+\s\s', corpus)) if phrase]:
        topics.append(extract_topic(i))
    topics = [x for x in topics if x is not None]

    #For each page, extract date. 
    year = set([x for x in year if x is not None])
    year.add(get_date(page))

    #For each page, now combine the topic + date.
    for i in topics:
        questions[i].add(year)

return questions

此函数中的所有内容均按预期工作，但最后一个questions[i].add(year)返回KeyError除外。我要去哪里错了？

Answer 1

您应该在字典中添加键之前为其创建一个列表。请将for loop更改为以下内容：

for i in topics:
    if i not in questions:
        questions[i] = list()    
    questions[i].append(year)

或者，按照@Jon Clements的建议：

for topic in topics:
    questions.setdefault(topic, []).append(year)

如何在字典中更改列表？

1 个答案: