我正在尝试用Python创建一个Google Ngram-esque程序(CS-I项目)。我有一个CSV文件,如下所示:
aardvark, 2007, 123948
aardvark, 2008, 120423
aardvark, 2004, 96323
gorilla, 2010, 120302
gorilla, 2008, 89323
raptorjesus, 1996, 214
第一个值代表单词,第二个值是我们计算出现次数的第二个,第三个是出现次数。
我有一个类CountByYear
,它接收单词,年份和频率,并返回一个CountByYear对象。
我需要通读CSV文件并打印包含单词作为键的字典,其中CountByYear对象列表为值(不含单词)。例如:
{'aardvark': [CountByYear(year=2007, count=123948), CountByYear(year=2008...etc.], 'gorilla: [CountByYear(year=2010, count=120302), etc...)]
我坚持认为我实际上应该得到一年并计算每个对象。现在我正在做:
for line in f:
splitLine = line.strip().split(',')
words[splitLine[0]] = countList
print(words)
打印{aardvark': [], 'gorilla': [], 'raptorjesus': []
,这很好,因为至少我知道我正在正确地做字典部分。但是如何用我想要的数据填充这些空列表呢?
答案 0 :(得分:1)
您没有包含CountByYear类的示例,但您指定它有一个构造函数,其中包含" word"," year"和" frequency&# 34。
假设这样的定义:
class CountByYear(object):
def __init__(self, word, year, frequency):
self.word = word
self.year = year
self.frequency = frequency
def __repr__(self):
return "CountByYear(year=%s, count=%s)" % (self.year, self.frequency)
您可以这样做:
words = {}
for line in f:
word,year,freq = [i.strip() for i in line.split(',')]
#create a new list if one does not already exist for this word
if not words.get(word):
words[word] = []
#add this CountByYear object to corresponding list in the dictionary
words[word].append(CountByYear(word,year,freq))
print(words)
示例输入文件中上述代码的输出为:
{'gorilla': [CountByYear(year=2010, count=120302), CountByYear(year=2008, count=89323)], 'aardvark': [CountByYear(year=2007, count=123948), CountByYear(year=2008, count=120423), CountByYear(year=2004, count=96323)], 'raptorjesus': [CountByYear(year=1996, count=214)]}
答案 1 :(得分:0)
一种方法是使用defaultdict。例如,
from collections import defaultdict
words = defaultdict(list)
with open("data.csv", "r") as f:
for line in f.readlines():
key_name, year, count = line.rstrip().split(',')
words[key_name] += [year, count]
# or words[key_name] += CountByYear(year, count) or similar
print(words)
答案 2 :(得分:0)
尝试使用csv
模块(https://docs.python.org/3.4/library/csv.html)和
import csv
words = {}
with open('eggs.csv', newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for word, year, count in reader:
words[word] = words.get(word, []) + [CountByYear(word, year, count)]
print(words)