Question

我正在尝试用Python创建一个Google Ngram-esque程序（CS-I项目）。我有一个CSV文件，如下所示：

aardvark, 2007, 123948
aardvark, 2008, 120423
aardvark, 2004, 96323
gorilla, 2010, 120302
gorilla, 2008, 89323
raptorjesus, 1996, 214

第一个值代表单词，第二个值是我们计算出现次数的第二个，第三个是出现次数。

我有一个类CountByYear，它接收单词，年份和频率，并返回一个CountByYear对象。

我需要通读CSV文件并打印包含单词作为键的字典，其中CountByYear对象列表为值（不含单词）。例如：

{'aardvark': [CountByYear(year=2007, count=123948), CountByYear(year=2008...etc.], 'gorilla: [CountByYear(year=2010, count=120302), etc...)]

我坚持认为我实际上应该得到一年并计算每个对象。现在我正在做：

for line in f:
    splitLine = line.strip().split(',')
    words[splitLine[0]] = countList
print(words)

打印{aardvark': [], 'gorilla': [], 'raptorjesus': []，这很好，因为至少我知道我正在正确地做字典部分。但是如何用我想要的数据填充这些空列表呢？

Answer 1

您没有包含CountByYear类的示例，但您指定它有一个构造函数，其中包含＆＃34; word＆＃34;，＆＃34; year＆＃34;和＆＃34; frequency＆＃ 34。

假设这样的定义：

class CountByYear(object):
    def __init__(self, word, year, frequency):
        self.word = word
        self.year = year
        self.frequency = frequency

    def __repr__(self):
        return "CountByYear(year=%s, count=%s)" % (self.year, self.frequency)

您可以这样做：

words = {}
for line in f:
    word,year,freq = [i.strip() for i in line.split(',')]
    #create a new list if one does not already exist for this word
    if not words.get(word):
        words[word] = []
    #add this CountByYear object to corresponding list in the dictionary
    words[word].append(CountByYear(word,year,freq))
print(words)

示例输入文件中上述代码的输出为：

{'gorilla': [CountByYear(year=2010, count=120302), CountByYear(year=2008, count=89323)], 'aardvark': [CountByYear(year=2007, count=123948), CountByYear(year=2008, count=120423), CountByYear(year=2004, count=96323)], 'raptorjesus': [CountByYear(year=1996, count=214)]}

Answer 2

一种方法是使用defaultdict。例如，

from collections import defaultdict

words = defaultdict(list)

with open("data.csv", "r") as f:
    for line in f.readlines():
        key_name, year, count = line.rstrip().split(',')
        words[key_name] += [year, count]
        # or  words[key_name] += CountByYear(year, count) or similar

print(words)

Answer 3

尝试使用csv模块（https://docs.python.org/3.4/library/csv.html）和

之类的内容

import csv

words = {}
with open('eggs.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=' ', quotechar='|')

    for word, year, count in reader:
        words[word] = words.get(word, []) + [CountByYear(word, year, count)]

print(words)

将对象放入列表中的字典

3 个答案: