如何在Python中迭代defaultdict(list)?

时间:2011-12-27 15:58:27

标签: python loops dictionary iterator defaultdict

如何在Python中迭代defaultdict(list)? 有没有更好的方法在Python中使用列表字典? 我尝试了正常iter(dict),但我收到了错误:

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

主要班级:

import para
para.print_doc('./foo/bar/para-lines.txt')

para.pyc:

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

例如./foo/bar/para-lines.txt的内容如下所示:

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

主类的输出应如下所示:

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.

5 个答案:

答案 0 :(得分:4)

你遇到的问题

for para in iter(doc):

doc是段落的实例,而不是defaultdict。您在__init__方法中使用的默认字典超出范围并丢失。所以你需要做两件事:

  1. doc方法中创建的__init__保存为实例变量(例如self.doc)。

  2. 使Paragraphs本身可迭代(通过添加__iter__方法),或允许它访问创建的doc对象。

答案 1 :(得分:2)

您链接的食谱相当陈旧。它是在2001年编写的,之前Python有更多现代工具,如itertools.groupby(在Python2.4中引入,released in late 2003)。以下是使用groupby代码的代码:

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))

答案 2 :(得分:0)

问题似乎是你正在迭代你的Paragraphs类,而不是字典。此外,请考虑使用

,而不是迭代键然后访问字典条目
for (key, value) in d.items():

答案 3 :(得分:0)

它失败了,因为您没有在Paragraphs类中定义__iter__(),然后尝试调用iter(doc)(其中doc是Paragraphs实例)。

要进行迭代,类必须有__iter__()才能返回迭代器。 Docs here

答案 4 :(得分:0)

我无法想到你在这里使用词典的任何理由,更不用说默认词了。列表清单会简单得多。

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)