从文本文件Python创建字典

时间:2014-11-24 20:07:20

标签: python list dictionary

我必须创建一个基于csv文件的字典,如下所示:

'song, 2000, 184950'
'boom, 2009, 83729'
'boom, 2010, 284500'
'boom, 2011, 203889'
'pow, 2000, 385920'
'pow, 2001, 248930'

由此,我必须创建一个字典,其中包含单词作为键,然后将类对象列表作为值。

这就是我到目前为止......

class Counter():
   __slots__ = ('year', 'count')
   _types = (int, int)

def readfile(file):
   d = dict()
   with open(file) as f:
      for line in f:
          element = line.split(,)
          for word in element:
              if word in d:
                  d[word].append([Count(int(element[1]), int(element[2]))])
              else:
                  d[word] = [Count(int(element[1]), int(element[2]))]
   print(d)

我得到的输出很奇怪,它给了我一个类似于我应该看起来的字典,但它使用计数(183930)作为键而不是名称。我还需要它将类添加到值中,如果它已经在字典中列出。

例如,因为'boom'应该已经在{'boom' : Count(year = 2009, count = 83729)}的字典中了,我希望在一个值下有一个Count对象的列表。

预期产出:

{'song' : [Count(year= 2000, count= 184950)], 'boom' : [Count(year=2009, count=83729),
Count(year=2010, count= 284500), Count(year=2011, count=203889)], 'pow' : ...etc..} 

2 个答案:

答案 0 :(得分:0)

有了这个循环:

for word in element:
    if word in d:
        d[word].append([Count(int(element[1]), int(element[2]))])
    else:
        d[word] = [Count(int(element[1]), int(element[2]))]

你在线上迭代所有单词,所以你打电话(第一行):

d['song'].append([Count(int('2000'), int('184950'))])
d['2000'].append([Count(int('2000'), int('184950'))])
d['184950'].append([Count(int('2000'), int('184950'))])

只需使用:

for line in f:
    element = line.split(,)
    word = element[0]
    if word in d:
        d[word].append(Count(int(element[1]), int(element[2])))
    else:
        d[word] = [Count(int(element[1]), int(element[2]))]

如果您使用collections.defaultdict

,也可以替换条件if word in d
import collections

def readfile(file):
   d = collections.defaultdict(list)
   with open(file) as f:
      for line in f:
          element = line.split(,)
          word = element[0]
          d[word].append(Count(int(element[1]), int(element[2])))

   print(d)

答案 1 :(得分:0)

这是一些简单的代码,可以满足您的需求。

from collections import defaultdict
from pprint import pprint


class Counter(object):
    __slots__ = ('year', 'count')

    def __init__(self, year, count):
        self.year = year
        self.count = count

    def __str__(self):
        return "Counter(year=%d, count=%d)" % (self.year, self.count)

    def __repr__(self):
        return self.__str__()


def import_counts():
    counts = defaultdict(list)
    with file('data.csv') as f:
        for line in f:
            name, year, count = line.split(',')
            name, year, count = name.strip(), int(year.strip()), int(count.strip())
            counts[name].append(Counter(year, count))

    return counts

pprint(import_counts())

但是,我将数据格式更改为适当的CSV,如下所示。

song, 2000, 184950
boom, 2009, 83729
boom, 2010, 284500
boom, 2011, 203889
pow, 2000, 385920
pow, 2001, 248930

生成的输出如下:

{
    'boom': [
        Counter(year=2009, count=83729),
        Counter(year=2010, count=284500),
        Counter(year=2011, count=203889)
    ],
    'pow': [
        Counter(year=2000, count=385920), 
        Counter(year=2001, count=248930)
    ],
    'song': [
        Counter(year=2000, count=184950)
    ]
}

请注意,如果给出无效的CSV,则上述内容不会验证输入并会出错。